Dealing with large data set

I am trying to run with a large data set, ~200,000 eBOSS spectra, and stumbled upon an issue with memory.
What would be the best strategy to deal with that?
Is there an option `float32`, or should I split the spectra I am looking into computing
in half according to lambdaRF and tape as best as I can after?

```
INFO: Starting EMPCA
       iter        R2             rchi2
Traceback (most recent call last):
  File "<HOME>/redvsblue/bin//redvsblue_compute_PCA.py", line 205, in <module>
    model = empca.empca(pcaflux, weights=pcaivar, niter=args.niter, nvec=args.nvec)
  File "<HOME>/Programs/sbailey/empca/empca.py", line 307, in empca
    model.solve_eigenvectors(smooth=smooth)
  File "<HOME>/Programs/sbailey/empca/empca.py", line 142, in solve_eigenvectors
    data -= np.outer(self.coeff[:,k], self.eigvec[k])    
  File "<HOME>/.local/lib/python3.6/site-packages/numpy/core/numeric.py", line 1203, in outer
    return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)
MemoryError
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with large data set #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Dealing with large data set #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions