I've recently read that it may be feasible to digitize animal and plant DNA and store them on hardware media
Researchers do it quite routinely these days, but it isn't as useful as you probably think it is, because the information isn't actually stored in the DNA, it is stored being implicit in the DNA and its environment in terms of proteins and other cellular chemistry. So a DNA sequence on a CD doesn't actually tell you how to build a human being.
Consider - every cell carries the same genetic information.
(Side note for the perfectionists: I know that's not strictly true either, there are mutations, and there are systematic drifts in DNA across different cells in the human body, but let's leave this aside for the moment as it isn't relevant for the following.)
So - if the cell replicates based on the same information, how does it know whether it is supposed to become a nerve, a muscle fiber or part of the skin?
The key idea is 'gene expression' - proteins make all the difference what genes are actually read from the sequence (and a gene is a more general concept than part of the sequence - if you shift the reading frame by a base pair, you can read the same sequence and get a different gene). Proteins can block genes or unblock genes, and proteins feel the cellular chemistry and environment conditions. But DNA encodes what proteins to build, and proteins also make cellular chemistry. They're three interlocking aspects to the way information is represented.
So as a human develops from a fertilized cell, there's an intricate cascade of events happening - the external environment defines an 'up' and 'down' and cells feel whether they're 'inside' or 'outside' of the heap of dividing cells, that triggers proteins starting to read different genes dependent on environment, so the embryo gets a first basic form, then proteins read yet other genes which make proteins which are able to build hormones which in turn can affect protein interaction elsewhere in the developing body.
There's nowhere a blueprint of how to build a human being in the DNA, the information emerges implicitly as this whole cascade unfolds in the proper environment. So a CD full of DNA sequences is useless, because you don't have the environment to actually unfold this information properly.
It's actually one of the most fascinating concepts to store information I know, because it also solves the problem of self-referencing:
Say DNA would be the instruction to build a cell. But DNA is part of the cell. So the blueprint needs to contain how to make the DNA. So the blueprint needs to contain a blueprint how to build itself - but that leads into an infinite regression. It's akin to the problem of writing computer code that does nothing except print its own source code on the screen - that's a fairly tricky problem outside of interpreter languages.
I read somewhere that in raw binary it only takes 2 bits to store each nucleotide - that would only be 800 megabytes, or slightly more than a CD. I also read that the storage capacity within human DNA is practically beyond comprehension
Well, a base pair encodes either A, C, G or T, and genes are just sequences of triplets AAC ACT GAT TAC CAT which each encode an amino acid to make a protein from, so you have a molecule per letter and three molecules per amino acid in addition to the double-helix structure stabilizing it. Which is fairly efficient.