Skip to content

Dealing with large data set #5

@londumas

Description

@londumas

I am trying to run with a large data set, ~200,000 eBOSS spectra, and stumbled upon an issue with memory.
What would be the best strategy to deal with that?
Is there an option float32, or should I split the spectra I am looking into computing
in half according to lambdaRF and tape as best as I can after?

INFO: Starting EMPCA
       iter        R2             rchi2
Traceback (most recent call last):
  File "<HOME>/redvsblue/bin//redvsblue_compute_PCA.py", line 205, in <module>
    model = empca.empca(pcaflux, weights=pcaivar, niter=args.niter, nvec=args.nvec)
  File "<HOME>/Programs/sbailey/empca/empca.py", line 307, in empca
    model.solve_eigenvectors(smooth=smooth)
  File "<HOME>/Programs/sbailey/empca/empca.py", line 142, in solve_eigenvectors
    data -= np.outer(self.coeff[:,k], self.eigvec[k])    
  File "<HOME>/.local/lib/python3.6/site-packages/numpy/core/numeric.py", line 1203, in outer
    return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)
MemoryError

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions