Skip to content

Lost connection to server at line 4 of Top-Level:rdf_loader_run() #1348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yanshuqiiii opened this issue Apr 7, 2025 · 4 comments
Open

Comments

@yanshuqiiii
Copy link

yanshuqiiii commented Apr 7, 2025

When importing freebase data into virtuoso, the following error occurs:

SQL> rdf_loader_run();

*** Error 08S01: [Virtuoso Driver]CL065: Lost connection to server
at line 2 of Top-Level:
rdf_loader_run()
Check the log:

21:16:15 WARNING: * Monitor: Locks are held for a long time

21:17:35 WARNING: * Monitor: Many lock waits

21:19:35 WARNING: * Monitor: Many lock waits

21:21:36 WARNING: * Monitor: Many lock waits

21:23:36 WARNING: * Monitor: Many lock waits

21:25:36 WARNING: * Monitor: Many lock waits

21:27:58 WARNING: * Monitor: Many lock waits

21:29:58 WARNING: * Monitor: Many lock waits

21:31:58 WARNING: * Monitor: Many lock waits

21:33:58 WARNING: * Monitor: Many lock waits

21:35:58 WARNING: * Monitor: Many lock waits

How to solve it?

@HughWilliams
Copy link
Collaborator

HughWilliams commented Apr 7, 2025

What Freebase dataset(s) are being loaded, and what is their size in number of triples? Please also provide the link you downloaded it from, if available.

Have you done any RDF Performance Tuning of the Virtuoso instance?

Is the Virtuoso server still running, or has it shutdown and need to be restarted? Do the OS system logs show any Virtuoso errors that might have caused it to shutdown, resulting in the Error 08S01: [Virtuoso Driver]CL065: Lost connection to server error in isql?

If the Virtuoso server is still running or doing a bulk load attempt before the connection is lost, please provide the output of the status(); command run from isql for review.

Please provide a copy of the virtuoso.ini and virtuoso.log files for review.

@timhaynesopenlink
Copy link
Collaborator

In addition to Hugh's response, I also suggest:

  • run the server as ./virtuoso -df so we can see full debug console output
  • before starting the bulk-load, in isql, run trace_on('errors'); so more error states are reported

Thanks

@yanshuqiiii
Copy link
Author

yanshuqiiii commented Apr 8, 2025

Thank you for your reply. Below are the files provided.
The dataset download link is as follows: Freebase Triples: https://developers.google.com/freebase?hl=zh-cn. Data filtering was applied using the filtering code as shown below:

import re
import sys
 
prefixes = re.compile("@")
quotes = re.compile("[\"]")
ns = "http://rdf.freebase.com/ns/"
xml = "http://www.w3.org/2001/XMLSchema"
re_ns_ns = "^\<{0}[mg]\.[^>]+\>\t\<{0}[^>]+\>\t\<{0}[^>]+\>\t.$".format(ns)
re_ns_en = "^\<{0}[mg]\.[^>]+\>\t\<{0}[^>]+\>\t[\'\"](?!/).+[\'\"](?:\@en)?\t\.$".format(ns)
re_ns_xml = "^\<{0}[mg]\.[^>]+\>\t\<{0}[^>]+\>\t.+\<{1}\#[\w]+\>\t.$".format(ns, xml)
 
line_number = 0
for line in sys.stdin:
    line_number += 1
    # line = line.rstrip().replace(ns, 'ns:').replace(key, 'key:')
    line = line.rstrip()
    if line == "":
        sys.stdout.write('\n')
    elif prefixes.match(line):
        sys.stdout.write(line + '\n')
    elif line[-1] != ".":
        sys.stderr.write("No full stop: skipping line %d\n" % (line_number))
        continue
 
    else:
        parts = line.split("\t")
        if len(parts) != 4 or parts[0].strip() == "" or parts[1].strip() == "" or parts[2].strip() == "":
            sys.stderr.write("n tuple size != 3: skipping line %d\n" % (line_number))
            continue
 
        if re.search(re_ns_en, line):
            sys.stdout.write(line + "\n")
        elif re.search(re_ns_ns, line):
            sys.stdout.write(line + "\n")
        elif re.search(re_ns_xml, line):
            sys.stdout.write(line + "\n")
 
    if line_number % 1000000 == 0:
        #sys.stderr.write("{}: {}\n".format(part, line_number))
        sys.stderr.flush()

RDF Performance Tuning:

NumberOfBuffers          = 400000
MaxDirtyBuffers          = 350000

virtuoso log:

14:58:35 INFO: { Loading plugin 1: Type `plain', file `wikiv' in `../hosting'
14:58:35 INFO:   WikiV version 0.6 from OpenLink Software
14:58:35 INFO:   Support functions for WikiV collaboration tool
14:58:35 INFO:   SUCCESS plugin 1: loaded from ../hosting/wikiv.so }
14:58:35 INFO: { Loading plugin 2: Type `plain', file `mediawiki' in `../hosting'
14:58:35 INFO:   MediaWiki version 0.1 from OpenLink Software
14:58:35 INFO:   Support functions for MediaWiki collaboration tool
14:58:35 INFO:   SUCCESS plugin 2: loaded from ../hosting/mediawiki.so }
14:58:35 INFO: { Loading plugin 3: Type `plain', file `creolewiki' in `../hosting'
14:58:35 INFO:   CreoleWiki version 0.1 from OpenLink Software
14:58:35 INFO:   Support functions for CreoleWiki collaboration tool
14:58:35 INFO:   SUCCESS plugin 3: loaded from ../hosting/creolewiki.so }
14:58:35 INFO: { Loading plugin 4: Type `plain', file `im' in `../hosting'
14:58:35 INFO:   IM version 0.62 from OpenLink Software
14:58:35 INFO:   Support functions for Image Magick 6.9.9
14:58:35 INFO:   SUCCESS plugin 4: loaded from ../hosting/im.so }
14:58:35 INFO: OpenLink Virtuoso Universal Server
14:58:35 INFO: Version 07.20.3229-pthreads for Linux as of Aug 15 2018
14:58:35 INFO: uses parts of OpenSSL, PCRE, Html Tidy
14:58:35 INFO: SQL Optimizer enabled (max 1000 layouts)
14:58:36 INFO: Compiler unit is timed at 0.001404 msec
14:58:45 INFO: Checkpoint started
14:58:45 INFO: Roll forward started
14:58:45 INFO: Roll forward complete
14:58:46 INFO: Checkpoint started
14:58:46 INFO: Checkpoint finished, log reused
14:58:46 INFO: Checkpoint started
14:58:46 INFO: Checkpoint finished, log reused
14:58:46 INFO: Checkpoint started
14:58:46 INFO: Checkpoint finished, log reused
14:58:47 INFO: Checkpoint started
14:58:47 INFO: Checkpoint finished, log reused
14:58:47 INFO: Checkpoint started
14:58:47 INFO: Checkpoint finished, log reused
14:58:48 INFO: PL LOG: Installing Virtuoso Conductor version 1.00.8783 (DAV)
14:58:48 INFO: PL LOG: Installing with dependencies Virtuoso Conductor version 1.00.8783/2018-08-15 20:02 (DAV)
14:58:48 INFO: Checkpoint started
14:58:48 INFO: Checkpoint finished, log reused
14:58:55 INFO: Checkpoint started
14:58:55 INFO: Checkpoint finished, log reused
14:58:55 INFO: PL LOG: Installation with dependencies complete
14:58:55 INFO: Checkpoint started
14:58:55 INFO: Checkpoint finished, log reused
14:58:55 INFO: HTTP/WebDAV server online at 8890
14:58:55 INFO: Server online at 1111 (pid 50977)
15:02:07 INFO: PL LOG: Loader started
Killed

status:

QL> status();            
REPORT
VARCHAR
_______________________________________________________________________________

OpenLink Virtuoso  Server
Version 07.20.3229-pthreads for Linux as of Aug 15 2018 
Started on: 2025-04-08 14:58 GMT+8
 
Database Status:
  File size 83886080, 10240 pages, 6422 free.
  400000 buffers, 3714 used, 1684 dirty 0 wired down, repl age 0 0 w. io 0 w/crsr.
  Disk Usage: 68 reads avg 0 msec, 0% r 0% w last  97 s, 2521 writes flush        136 MB,
    0 read ahead, batch = 0.  Autocompact 297 in 217 out, 26% saved.
Gate:  0 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap. 
Log = ../database/virtuoso.trx, 492 bytes
2073 pages have been changed since last backup (in checkpoint state)
Current backup timestamp: 0x0000-0x00-0x00
Last backup date: unknown
Clients: 2 connects, max 2 concurrent
RPC: 13 calls, 2 pending, 1 max until now, 0 queued, 0 burst reads (0%), 0 second 90M large, 315M max
Checkpoint Remap 38 pages, 0 mapped back. 0 s atomic time.
    DB master 10240 total 6382 free 38 remap 10 mapped back
   temp  256 total 251 free
 
Lock Status: 0 deadlocks of which 0 2r1w, 0 waits,
   Currently 3 threads running 0 threads waiting 0 threads in vdb.
Pending:
 
Client 1111:2:  Account: dba, 316 bytes in, 2712 bytes out, 1 stmts.
PID: 51041, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks: 
 
Client 1111:1:  Account: dba, 487 bytes in, 410 bytes out, 1 stmts.
PID: 51009, OS: unix, Application: unknown, IP#: 127.0.0.1
Transaction status: PENDING, 1 threads.
Locks: 
 
 
Running Statements:
 Time (msec) Text
       34636 rdf_loader_run()
         411 status()
 
 
Hash indexes
 

43 Rows. -- 413 msec.

@HughWilliams
Copy link
Collaborator

HughWilliams commented Apr 8, 2025

The status output shows you are running an old Virtuoso Version 07.20.3229-pthreads for Linux as of Aug 15 2018. We would recommend upgrading to the latest open source version, as there have been a number of memory and other improvements in Virtuoso in the latest releases, and trying again.

Also, when was the status() command run, as the virtuoso.log shows the Virtuoso server was Killed during the load, so I assume the Virtuoso server was restarted and the load resumed when the status command was run?

The status() command shows 400000 buffers, 3714 used indicating you have 400K memory buffers allocated to Virtuoso, which is suitable for a machine with about 5GB RAM. How much memory is available on the machine? The indicated freebase dataset reports it to consist of about 2 Billion triples, which typically would require about 20GB RAM for hosting in memory, for best performance. If it runs out of buffers, the server will start swapping to and from memory and disk, which will kill performance, and could result in system out of memory errors, resulting in the Virtuoso server being Killed as you are experiencing. If you check the system kernel logs it should show the reason the Virtuoso server was killed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants