Digitizing DNA

Allan

New member
Joined
Jun 26, 2010
Messages
86
Reaction score
0
Points
0
Location
Phoenix
I've recently read that it may be feasible to digitize animal and plant DNA and store them on hardware media. I think the article was related to space exploration but I can't be sure. I'm researching for a new book and, of course, I cannot find the reference to that recollection anywhere online. Did anyone else happen to see that article/blog post?

If DNS can be digitized how would it be seen on a standard computer screen (helix or 1's and 0's or perhaps hexadecimal?). The whole idea of digital warehousing DNS to reconstruct or archive is really compelling.

Is anyone familiar with this idea?

Thanks in advance!
 

jedidia

shoemaker without legs
Addon Developer
Joined
Mar 19, 2008
Messages
10,891
Reaction score
2,141
Points
203
Location
between the planets
I've recently read that it may be feasible to digitize animal and plant DNA and store them on hardware media.

DNA is one of the most (if not the most) compact data storage mediums known to us. Storing it as digital data would certainly be feasible (data is data), but it would take up a whole darn lot of space. In my opinion, doing the reverse (replicating a DNA structure as data storage) would be far more interesting.

Still, DNA has an awful lot of redundancy, so we could probably save most of the relevant information of some species on a few google servers. But a comprehensive library of DNA would quickly overwhelm our globally available data storage (hmmm... I never had a decent idea to send to XKCD what if, but this would be an interesting one).
 

kamaz

Unicorn hunter
Addon Developer
Joined
Mar 31, 2012
Messages
2,298
Reaction score
4
Points
0
I've recently read that it may be feasible to digitize animal and plant DNA and store them on hardware media.

The database is this way >>>.

---------- Post added at 09:29 PM ---------- Previous post was at 09:25 PM ----------

DNA is one of the most (if not the most) compact data storage mediums known to us. Storing it as digital data would certainly be feasible (data is data), but it would take up a whole darn lot of space.

Human genome is 3.2 billion base pairs, so using the most unefficient scheme -- one byte per base pair -- it's only 3.2GB.
 

Artlav

Aperiodic traveller
Addon Developer
Beta Tester
Joined
Jan 7, 2008
Messages
5,790
Reaction score
780
Points
203
Location
Earth
Website
orbides.org
Preferred Pronouns
she/her
I was under impression that this is being done all the time since a while ago.
Genetic engineering, genome project, etc - all need a whole DNA in computer memory.

The encoding is base 4, so 2 bits per base pair - a human DNA should fit on a good old CD.

IIf DNA can be digitized how would it be seen on a standard computer screen (helix or 1's and 0's or perhaps hexadecimal?).
Numbers or colours, or letters.

DNA itself is digital data - representing it should be a matter of convenience for whoever is working with it.
 

Urwumpe

Not funny anymore
Addon Developer
Donator
Joined
Feb 6, 2008
Messages
37,657
Reaction score
2,379
Points
203
Location
Wolfsburg
Preferred Pronouns
Sire
The encoding is base 4, so 2 bits per base pair - a human DNA should fit on a good old CD.

Still, nature is way more efficient: all DNA in all your cells would have less volume than a microSD card.
 

RisingFury

OBSP developer
Addon Developer
Joined
Aug 15, 2008
Messages
6,427
Reaction score
492
Points
173
Location
Among bits and Bytes...
DNA is one of the most (if not the most) compact data storage mediums known to us. Storing it as digital data would certainly be feasible (data is data), but it would take up a whole darn lot of space. In my opinion, doing the reverse (replicating a DNA structure as data storage) would be far more interesting.

About a CD worth for the entire human, even if you don't compress it very well. About 10 MB is the difference that makes you who you are.
 

Urwumpe

Not funny anymore
Addon Developer
Donator
Joined
Feb 6, 2008
Messages
37,657
Reaction score
2,379
Points
203
Location
Wolfsburg
Preferred Pronouns
Sire
So...which one has more trouble with corrupted data?:p

The CD ... our DNA has way more redundancy and ECC.

A CD gets read way less often than your DNA.
 

kamaz

Unicorn hunter
Addon Developer
Joined
Mar 31, 2012
Messages
2,298
Reaction score
4
Points
0
The CD ... our DNA has way more redundancy and ECC.

Actually, error correction in DNA is pretty crappy by IT standards:

The overall error rate of DNA polymerase in the replisome is 10^-8 errors per base pair. Repair enzymes fix 99% of these lesions for an overall error rate of 10^-10 per bp. That means one mutation in every 10 billion base pairs that are replicated.

The human haploid genome is 3.2 × 10^9 bp. That means that on average there are 0.31 mutations introduced every time the genome is replicated. In the male, there are approximately 400 cell divisions between zygote and the production of a sperm cell.1 This gives a total of about 124 new mutations in every sperm cell. In the female, there are about 30 cell divisions between zygote and the production of egg cells. That's about 9 new mutations in every egg cell.

http://sandwalk.blogspot.com/2013/03/estimating-human-human-mutatin-rate.html
 

Urwumpe

Not funny anymore
Addon Developer
Donator
Joined
Feb 6, 2008
Messages
37,657
Reaction score
2,379
Points
203
Location
Wolfsburg
Preferred Pronouns
Sire
Actually, error correction in DNA is pretty crappy by IT standards:



http://sandwalk.blogspot.com/2013/03/estimating-human-human-mutatin-rate.html

Yes, but that is 1990s genetic knowledge - at that time, people still thought that the DNA between the genes is useless waste, that serves no function at all. Or who thought that there is only one encyme doing transcription or replication of DNA.

The past 10 years had been really exciting in terms of what we learned about genetics....
 

jedidia

shoemaker without legs
Addon Developer
Joined
Mar 19, 2008
Messages
10,891
Reaction score
2,141
Points
203
Location
between the planets
About a CD worth for the entire human, even if you don't compress it very well. About 10 MB is the difference that makes you who you are.

Yes, but look at the size of it. Anyways, I did not exactly know anymore how much the relevant data really is. I just knew that the total ammount of data is enormous, but 99% redundant. Looks like the actual "base data" is a lot less than I thought, and the backups take a lot more space. Well, we know that from digital storage too, after all :p
 

Matias Saibene

Development hell
Joined
Jul 7, 2012
Messages
1,060
Reaction score
652
Points
128
Location
Monte Hermoso - Argentina
Website
de-todo-un-poco-computacion-e-ideas.blogspot.com.ar
This topic of genetic manipulation reminds me of the movie The Island

So...which one has more trouble with corrupted data?:p
I imagine a misreading of DNA:
Windows Human have found an error reading DNA. Press CTRL + ALT + DEL to restart.:compbash:
And so most of humanity became extinct:facepalm:.

So I've installed Ubuntu for human beings.:rofl:
 

fsci123

Future Dubstar and Rocketkid
Addon Developer
Joined
Aug 18, 2010
Messages
1,536
Reaction score
0
Points
0
Location
?
For some reason i had thought that this thread was about storing computer information on DNA. I was reading an article about that last night...

Isnt most of human dna really like discarded sections of Viruses/Randomly Repeating but non coding units?
 

Allan

New member
Joined
Jun 26, 2010
Messages
86
Reaction score
0
Points
0
Location
Phoenix
I read somewhere that in raw binary it only takes 2 bits to store each nucleotide - that would only be 800 megabytes, or slightly more than a CD. I also read that the storage capacity within human DNA is practically beyond comprehension - 700TB n a single gram of DNA (http://www.extremetech.com/extreme/...rams-700-terabytes-of-data-into-a-single-gram) and that was in 2012.

I wondering if we'd recognize DNA if it was presented to us digitally on a machine that wasn't setup to decipher DNA and present it as a helix? Would we see 1's and 0's and not understand what we're looking at or would some pattern develop that would tip off someone with a mind for such material?
 

kamaz

Unicorn hunter
Addon Developer
Joined
Mar 31, 2012
Messages
2,298
Reaction score
4
Points
0
Isnt most of human dna really like discarded sections of Viruses/Randomly Repeating but non coding units?

Not really random -- we have many sequences which are useful but inactive.

For example, animals do not have scurvy because they can synthesize vitamin C, while humans must take it from food. Humans also have the piece of DNA responsible for this functionality, but it is inactive -- some base pairs got flipped and it no longer works.

And then there is this: [ame="http://en.wikipedia.org/wiki/Endogenous_retrovirus"]Endogenous retrovirus - Wikipedia, the free encyclopedia[/ame]

This is a good book: http://www.amazon.com/Relics-Eden-Powerful-Evidence-Evolution/dp/1616141603
 

Thorsten

Active member
Joined
Dec 7, 2013
Messages
785
Reaction score
56
Points
43
I've recently read that it may be feasible to digitize animal and plant DNA and store them on hardware media

Researchers do it quite routinely these days, but it isn't as useful as you probably think it is, because the information isn't actually stored in the DNA, it is stored being implicit in the DNA and its environment in terms of proteins and other cellular chemistry. So a DNA sequence on a CD doesn't actually tell you how to build a human being.

Consider - every cell carries the same genetic information.

(Side note for the perfectionists: I know that's not strictly true either, there are mutations, and there are systematic drifts in DNA across different cells in the human body, but let's leave this aside for the moment as it isn't relevant for the following.)

So - if the cell replicates based on the same information, how does it know whether it is supposed to become a nerve, a muscle fiber or part of the skin?

The key idea is 'gene expression' - proteins make all the difference what genes are actually read from the sequence (and a gene is a more general concept than part of the sequence - if you shift the reading frame by a base pair, you can read the same sequence and get a different gene). Proteins can block genes or unblock genes, and proteins feel the cellular chemistry and environment conditions. But DNA encodes what proteins to build, and proteins also make cellular chemistry. They're three interlocking aspects to the way information is represented.

So as a human develops from a fertilized cell, there's an intricate cascade of events happening - the external environment defines an 'up' and 'down' and cells feel whether they're 'inside' or 'outside' of the heap of dividing cells, that triggers proteins starting to read different genes dependent on environment, so the embryo gets a first basic form, then proteins read yet other genes which make proteins which are able to build hormones which in turn can affect protein interaction elsewhere in the developing body.

There's nowhere a blueprint of how to build a human being in the DNA, the information emerges implicitly as this whole cascade unfolds in the proper environment. So a CD full of DNA sequences is useless, because you don't have the environment to actually unfold this information properly.

It's actually one of the most fascinating concepts to store information I know, because it also solves the problem of self-referencing:

Say DNA would be the instruction to build a cell. But DNA is part of the cell. So the blueprint needs to contain how to make the DNA. So the blueprint needs to contain a blueprint how to build itself - but that leads into an infinite regression. It's akin to the problem of writing computer code that does nothing except print its own source code on the screen - that's a fairly tricky problem outside of interpreter languages.



I read somewhere that in raw binary it only takes 2 bits to store each nucleotide - that would only be 800 megabytes, or slightly more than a CD. I also read that the storage capacity within human DNA is practically beyond comprehension

Well, a base pair encodes either A, C, G or T, and genes are just sequences of triplets AAC ACT GAT TAC CAT which each encode an amino acid to make a protein from, so you have a molecule per letter and three molecules per amino acid in addition to the double-helix structure stabilizing it. Which is fairly efficient.
 

fsci123

Future Dubstar and Rocketkid
Addon Developer
Joined
Aug 18, 2010
Messages
1,536
Reaction score
0
Points
0
Location
?
One again I have another question: what about mitochondrial DNA is this required to be stored with the nuclear DNA?
 
Top