Skip to content

Intermittent segfaults when constructing many tempopulsar objects #58

@ldunn

Description

@ldunn

I have been encountering intermittent segfaults when running code where I repeatedly (say, many hundreds of times) make calls like

psr = libstempo.tempopulsar(parfile, timfile)

Inspecting the core dumps shows that the segfault is coming from deep in tempo2 and the root cause is that the value of ne_sw_ifuncN in the pulsar struct is becoming garbage - the end of the backtrace typically looks like the following:

#0  0x000014fe6096144d in ifunc (mjd=mjd@entry=0x3b257d8, yoffs=yoffs@entry=0x3b27718, t=t@entry=57191.436434533738, N=465682051) at ifunc.C:38
#1  0x000014fe6090c421 in dm_delays (psr=<optimized out>, npsr=<optimized out>, p=<optimized out>, i=<optimized out>, delt=<optimized out>, dt_SSB=<optimized out>) at dm_delays.C:324
#2  0x000014fe608ed430 in calculate_bclt._omp_fn.0(void) () at calculate_bclt.C:143
#3  0x000014fe6060b736 in GOMP_parallel (fn=0x14fe608ed150 <calculate_bclt._omp_fn.0(void)>, data=0x7ffe2e859200, num_threads=1, flags=0) at ../../../libgomp/parallel.c:178
#4  0x000014fe608ed97b in calculate_bclt (psr=0x35c4f10, npsr=1) at calculate_bclt.C:63
#5  0x000014fe6093ac3e in formBatsAll (psr=0x35c4f10, npsr=1) at global.C:148
#6  0x000014fe60c7f2aa in __pyx_pf_9libstempo_9libstempo_11tempopulsar___cinit__ (__pyx_v_obsfreq=<optimized out>, __pyx_v_observatory=<optimized out>, __pyx_v_toaerrs=<optimized out>,
    __pyx_v_toas=<optimized out>, __pyx_v_t2cmethod=<optimized out>, __pyx_v_clk=<optimized out>, __pyx_v_ephem=<optimized out>, __pyx_v_units=<optimized out>,
    __pyx_v_maxobs=<optimized out>, __pyx_v_dofit=<optimized out>, __pyx_v_fixprefiterrors=<optimized out>, __pyx_v_warnings=<optimized out>, __pyx_v_timfile=0x14fe60e614d0,
    __pyx_v_parfile=<optimized out>, __pyx_v_self=0x14fe6112c540) at libstempo/libstempo.cpp:32805

I'm filing this as a libstempo bug because a tentative fix seems to be adding a memset call to tempopulsar's __cinit__ to zero out the allocated memory, as I've done here, but I'm not experienced enough with C or Cython to know if this is a good way to handle this, or if this is even really a libstempo bug as opposed to something going wrong in tempo2's memory management.

This may also be the cause of nanograv/enterprise#339?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions