I used a colour height map from the internet. The colours were probably completely wrong, but I wasn’t worried about that, as I could edit the .png as necessary.
That's the culprit. If you do not know what kind of color mapping the original source used, strange results are not uncommon.
The problem is this: how do you map a linear height value into the 3-dimensional color space? Usually, you walk the color-space cube along its edges, excluding the white and the black corner. Now even with this simple coding, there are multiple paths to walk along, so there is no single way to say "this is it".
There is also a second problem: bit depth. With 3 8-bit color values, you theoretically can encode 24-bit values. Orbiter's highest precision for height is 16 bit, currently. So in theory, you can encode the full precision in an 8-bit color PNG. However, the classical edge-walking color-mapping only encodes around 10 bits. So I've extended the classic color-mapping a bit by varying the "brightness" of each pixel to increase the bit-depth to 16 bits. This means you can create a sufficiently color-mapped 8-bit color PNG without losing precision, making round-trips (i.e. convert to PNG, then back to ELV again) possible.
The edge-walking that ele2png does is as follows: low=blue-magenta-red-yellow-green-cyan=high. If your source is not using the same encoding, you have to re-color it to match this mapping. In addition, chances are very high that your source does not do the same inter-spacing to increase bit depth, so you see the "ripples" when decoding it with it. The header value determining whether inter-spacing is used or not is "colormap". Setting this value to 1 instead of 2 will force ele2png to fall back to the 10-bit LUT, effectively eliminating the "ripples".
"ele2png" also tries to guess if the color value of a pixel is outside the valid mapping range (e.g. black or white pixels are usually absent in strict color maps). The guessing is done based on color "distance" to valid color map values. Besides the usual blue-to-cyan mapping, simple linear color maps are used as well, such as the gray-value mapping (i.e. all three colors of the same value like 55,55,55 or 120,120,120), or single-channel maps (e.g. only the red channel is encoding the height value). The maximum mapping error during conversion (i.e. the furthest distance a pixel had from a valid mapping color) will be displayed if the "-v" option is used 2 or more times (e.g. "-vv"). This gives a hint on how much guessing the program had to do to interpret the given picture.
That said, I'd suggest to get a grayscale 16-bit PNG as source instead. Less troubles.