Skip to content

</sentence> sentid="" output when assume_input_is_tokenized=on #7

@oktaal

Description

@oktaal

When I modify the Makefile.start_server script

assume_input_is_tokenized=off\

and change assume_input_is_tokenized=off to assume_input_is_tokenized=on the output becomes malformed.

For example:

$ make -f Makefile.start_server 
PROLOGMAXSIZE=1500M /opt/Alpino-git233/bin/Alpino -notk -veryfast user_max=20000\
            server_kind=parse\
            server_port=42424\
            assume_input_is_tokenized=on\
            debug=1\
            -init_dict_p\
            batch_command=alpino_server\
    	2> /alpino_server.log &

$ telnet localhost 42424
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hallo wereld .
top/top|top/hd|hallo/[0,1]|127.0.0.1
hallo/[0,1]|tag/nucl|wereld/[1,2]|127.0.0.1
/[2,3]|127.0.0.1app|.
<?xml version="1.0" encoding="UTF-8"?>
<alpino_ds version="1.6">
  <parser build="Alpino-x86_64-linux-glibc2.5-git233-sicstus" date="2021-02-04T16:52" cats="1" skips="0" />
  <node begin="0" cat="top" end="3" id="0" rel="top">
    <node begin="0" cat="du" end="3" id="1" rel="--">
      <node begin="0" end="1" frame="tag" his="normal" his_1="normal" id="2" lcat="advp" lemma="hallo" pos="tag" postag="TSW()" pt="tsw" rel="tag" root="hallo" sense="hallo" word="hallo"/>
      <node begin="1" cat="np" end="3" id="3" rel="nucl">
        <node begin="1" end="2" frame="noun(de,count,sg)" gen="de" genus="zijd" getal="ev" graad="basis" his="normal" his_1="normal" id="4" lcat="np" lemma="wereld" naamval="stan" ntype="soort" num="sg" pos="noun" postag="N(soort,ev,basis,zijd,stan)" pt="n" rel="hd" rnum="sg" root="wereld" sense="wereld" word="wereld"/>
"/>pecial="hoofd" word=".m" positie="vrij" postag="TW(hoofd,vrij)" pt="tw" rel="app" root=".ssion" id="5" infl="both" lcat="detp" lemma=".
      </node>
    </node>
  </node>
</sentence> sentid="127.0.0.1">hallo wereld .
</alpino_ds>
Connection closed by foreign host.

Keeping assume_input_is_tokenized to off does give a correctly formatted sentence item: <sentence sentid="127.0.0.1">hallo wereld .</sentence>.

I have to implement a work-around here anyway to support older Alpino-versions, so this isn't an issue for me. But I was wondering if there might be some setting I'm missing here to prevent this from happening? I couldn't figure out where in the Alpino-code this goes wrong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions