-
Notifications
You must be signed in to change notification settings - Fork 26
Suggested merge items for 1.6
Ian Stewart, 26 May 2016: A large number of PRs has accumulated in the LIME repo, many of them mine. We should look at trying to merge some of them, and look towards the next release, which would be 1.6 I guess. Sébastien asked me to circulate among you a list of which, in my opinion, are 'the most urgent issues / proposed enhancements'.
The short answer is, I would like to see all of my own PRs make it into the next LIME version. No surprise perhaps...
The PRs can argue for themselves on a one-to-one basis, but perhaps it would help if I approach the issue from another direction, i.e. list the ways in which I feel the code needs to improve. (I won't mention pure bug fixes, since the need to include these should be self-evident.)
-
There is a great deal of room for improvement in the self-documentation of LIME: there are far too many obscure, misleading and single-letter variable names.
-
Not only could the way the various functions are called in main() be better arranged, but also the way the code is parcelled into functions. A function such as molinit() for example performs a number of different things to different variables and thus could usefully be split up, to great improvement in the readability of the code.
Such issues as these don't cause the user any problems, but they can make it far more difficult for the people tasked with maintaining and improving the software. There is a long list of bugs and traps found in LIME over the past year or so; we must not kid ourselves that there will not be more. Every bit of added code naturally also carries the chance to add more bugs. Bugs are a fact of life, but the easier it is to find them, or on the flip side the easier it is to avoid adding new ones, the happier I think we will all be.
Many of my PRs address these problems on the side, as it were. The only PR which directly addresses a structural issue is #77, which separates the strictly user-settable parameters from values which the user is not meant to set, but which contain configuration information. In a situation in which the user has access to the source code, of course one cannot prevent them doing whatever they like with it, but nevertheless this is an important distinction, and again is something which would make life easier for code maintainers. Also, the more we add interface possibilities which move away from the present joint-compilation model, the more important this distinction will become.
At some later time I strongly suggest a rearrangement of the way memory is allocated for the grid struct (see discussion in #81). This would supersede at least the memory issues of Sergey's PR #64. We can't do everything at once though and my suggestion would be to wait til after 1.6 before undertaking such a drastic modification to LIME.
In general I think modifications to LIME should, if at all possible, not be single-purpose things which are simply tacked on. If people want to use the software in new ways, which will happen from time to time, then ideally the code would be modified so as to become more generic and protean. Rather than add specific ability A into LIME, to be invoked by some parameter added to the already quite lengthy list of user parameters, it is better to adapt LIME so the user can themselves implement A, B or C as they choose in the model file.
I do not think the present system of storing grid point data in various sorts of ASCII or bespoke-format binary file is really sustainable. FITS would be a far better format than either; however at the telecon we had in March, people were not enthusiastic about FITS and preferred HDF5. I can see the point of this, however I think there is an issue beyond the immediate question of file formats, and that is the more abstract one of a data model. I strongly support and suggest the formulation of a data model for the information stored in the grid struct. Ideally we would have an interface available, not tied to a file format, which would allow grid data at several stages of completion to be exchanged with a disk file; this interface layer could then talk to a FITS layer or an HDF5 layer, whatever is provided or present. Anything would be better than the present vagueness and incompleteness. To see the point and use of a clearly defined data model, consider why FITS itself has been so successful: not because it is a good file format (it isn't) but largely because it has been very minutely defined and documented.
By `data model' I mean (i) identifying several stages of completion of the grid information (bare positions, triangulated, +user function values, +populations would be the 4 stages I would suggest), (ii) specifying exactly what a file record of grid in each of these stages should contain.
I plan to modify my PR #92 to split the general IO interface mentioned above from the FITS-specific parts. Then if somebody wanted to write an HDF5 layer at a later date to plug into this, this should be relatively simple. The important thing at the present time, and something I would like to see in 1.6, is a firm data model + interface, so the user knows exactly where they are; knows for example exactly what they need to provide if they generate a grid elsewhere and want to run LIME on it, and has the flexibility in LIME to allow that, without the present ugly need for them to hack the code to add more fields to the freads in predefgrid() or popsin().
I would also suggest (and will implement myself if desired) separate tasks to translate FITS grid files to and from the present ASCII/binary versions.
If #92 in this modified form were accepted it would supersede Sergey's #66 and the remaining part of his #64.
There are three ways in which IMO this needs to be improved.
Attila has a PR (#71), part of which addresses this issue; my #103 is also directed at it. I think my proposal is more generic, and it is also more targeted; #71 also contains suggested changes concerning other, only semi-related problems.
The present algorithm uses the rejection method to generate random point locations. This has the virtue of being simple and robust, but if the distribution function it is trying to follow has very great variations in value over its range, which is e.g. generally the case with molecular densities, then rejection becomes rather slow, since the algorithm spends most of its time rejecting points from areas where the function is relatively low-valued. My PR #93 proposes an alternative algorithm which is much faster. This would allow grid point distributions to be more faithful to extreme variations in the density function, without a significant time penalty. This might also obviate the need for Attila's special regions, which seem to me a bit of a tack-on TBH, with extra user parameters needed. In fact, given a dedicated function for grid point number density, the user has freedom to include regions in this, or whatever else they like. I don't see that it is necessary to add more user parameters to achieve this functionality.
Smoothing consumes a lot of time in relative terms. Attila's #71 reduces the number of iterations from 20 to 5, which I have no objection to, but I also think the iteration number (together with a lot more instances of hard-wired values in LIME!) ought to be added to the global variables in the header file.
Issues #58, #86, #91, #123 raise questions in this area. There are two sources of input about the (bulk, i.e. non-radiating) molecular species present in the physical medium: the 'collision partners' listed in the molData file, and the vector-valued density function supplied by the user.
On the software side of things the present arrangement is a poor one because there is a poor connection between these two sources of information, poor definition of which information source has 'authority', and so forth. The fact is that there is a need to identify which species corresponds to which density return. In current LIME this is achieved (although not rigorously) through a complicated procedure of guessing. It is all very messy and unsatisfactory.
There is also very little flexibility in current lime in how (i) dust opacity and (ii) radiating-species number densities are calculated.
PR #128 aims to tackle all of these issues. I was reluctant to add to the list of user-settable parameters but here I could not think of another way to tackle the problem. There is a need to add functionality and that is all there is to it. Hopefully however the suggested interface is generic enough to cover all the cases raised in the various issues, while also (in the absence of the user supplying the parameters) defaulting to as near to the present 'guessing' behaviour as possible.
In theory LIME can deal with more than 1 species, but there are in fact numerous places in the code where an attempt to do so will generate a seg fault at worst, and wrong values at best. Tackling all of these places is beyond certainly my present resources; what I have been doing with my PRs is to clear them up as I come to them. A number of problems with multi-species support are repaired in #128 for example.
Blended lines is another situation LIME 1.5 claims to be able to handle, but in fact there are traps and problems with the code which is supposed to implement this as well. #118 implements a fairly thorough fix.
As noted by various people at various times, the photon-propagation algorithm contains a number of inaccuracies. Tuomas's #83 addresses some of these. This should be included in 1.6.
I have a number of PRs which concern raytracing. I would like to see them all in, but they could perhaps be unified into one single PR to make checking/merging easier. The desired improvements come under 3 heads:
PR #109 offers some speedups. However Christian in issue #95 raised the point that there is an algorithm from 1.4 which was never imported into 1.5 and which offers speed improvements. I'll try to add this to my raytracing PRs.
There is a big problem with LIME in that the output image shows step-function artefacts at the boundaries of Voronoi cells. Step-functions cause a lot of added noise when Fourier transformed so as to simulate visibilities. In present practice it is usual to run LIME multiple times and average the images to smooth this out a bit. Thus 'smoother' perhaps should also be under the head of 'faster'.
My PR #96 addresses this by offering a new algorithm which works via linear interpolation inside Delaunay tetrahedra. The images still have artefacts but they are linear-function artefacts rather than step-function ones - i.e. there are step discontinuities in the image gradient rather than the image itself. This makes for a much cleaner Fourier transform. The new algorithm is slower (although a memory rearrangement would allow a lot of precalculation, which ought to speed things up) but if the task needs to be run fewer times, this ought to amply compensate for that.
PR #79 (which makes use of some of Christian's 1.4 code) addresses inadequacies the current version has in dealing with image pixels for which a significant number of grid points fall within the pixel. The new algorithm smooths the output for such pixels and removes the need for aliasing.