Pocket-sized DNA computers might save the world by 2030

CATALOG head of molecular biology Tracy Kambara prepares Shannon, the first commercially viable automated DNA storage and computation platform for enterprise use
(Image credit: Catalog)

We first heard about Catalog, a pioneer in DNA-based data storage in October 2020 and did an interview with David Turek, its CEO and an IBM alumnus. 

Almost a year later, they've announced a $35 million Series B round, led by Hamwha Impact Partners and plans to launch its first chemical-based computing platform which combines both data management (and storage) and computation via synthetic DNA manipulation.

The time was therefore right to catch up with Catalog and put its CEO, Hyunjun Park, in the interviewee seat.

 1. So what is the latest on Shannon? What has happened since we last interviewed Dave Turek (the CTO of Catalog)?

Over the past year, CATALOG has worked with several leading IT, Energy, and Media and Entertainment companies on collaborations to help advance the technology for commercialization. Through this work, CATALOG has discovered broad applicability of our platform across industry sectors, as well as nearly universal demand for what DNA-based computing promises among heavy data users. Early applications that we can speak about at the moment include digital signal processing, such as seismic processing in the energy sector, and database comparisons, such as fraud protection and identity management in the financial industry.

2. Right now Shannon is a bit like the ENIAC of its generation: bulky, slow, expensive, limited but groundbreaking. If we were to fast forward to 2030; how would Shannon v10 look like?

Shannon helped prove that the process of automating and scaling DNA-based storage and now DNA-based computation was achievable. For this purpose alone, it was important to build Shannon. As we move a decade out, future versions of the technology will be smaller and more portable, faster, and more efficient. It's certainly conceivable that by 2030 you could see desktop and pocket size versions of Shannon available and using very small amounts of energy for both storage and compute. 

3. DNA in computing is usually associated with storing data. Catalog wants to bring DNA into algorithms and applications? But how?

By computing with DNA we mean the transformation of data encoded in DNA into some new kind of information. For example, if I have an input file of two large numbers, multiplying them together creates a number that was not previously present in the file—this is new information which represents the product of the two pieces of data. We believe that we can create a set of chemical “instructions” which can operate on DNA encoded data to create new information.  Examples would include problems in optimization (finding the biggest, the smallest, the best of something in finance, logistics, manufacturing), problems in signal processing (applied in areas like seismic processing in the oil and gas industry),  and problems in inferencing and machine learning to begin with. The advantage with DNA is that we can perform these operations at extreme levels of parallelism meaning we can apply billions or trillions of compute agents to work collectively to solve the problem at hand.   Each of the compute agents (likely composed of a collection of molecules) will be relatively weak as a compute engine, but the opportunity to bring billions or trillions together to work a problem will potentially dramatically reduce time to insight.  

It's certainly conceivable that by 2030 you could see desktop and pocket size versions of Shannon available and using very small amounts of energy for both storage and compute.

Catalog CEO, Hyunjun Park

Another domain of interest to us is search.  We can use chemical instructions to quickly find data objects encoded into DNA independent of the volume of data.   This means that as the amount of data we are searching grows, we can employ chemical search techniques which will essentially be independent of the volume of data—time to solution will remain more or less invariant.  That is not the case in many electronic search applications today and the reason for the difference is that a DNA store is a collection of molecules floating in a liquid and independent of the kind of physical organization that exists with electronic media:  a tape cartridge has to inspected in serial fashion because that is how it is physically organized (A precedes B which precedes C and so on).  In a DNA file the molecules are all jumbled together in a liquid and can be searched directly.This compresses time to insight and reduces cost.   

4. Your funding news also mentions that DNA-based computation is expected in 2022? What does that mean and will it be more widely available?

By next year, CATALOG will demonstrate the value of DNA-based computation through a specific business use case.  It will likely show the business value of analyzing data previously sitting in cold storage in one particular industry.  Our expectation is that as use cases expand we will allow clients to access our technology via the Web as a service (sometime in 2024); we also contemplate the possibility of building miniature devices capable of executing computation on a customer premise at some point subsequently 

5. Right now, a sample of DNA-based storage looks like an orange substance in a test tube. What shape/size will it ultimately take?

DNA-based storage is molecules of DNA floating in a liquid (orange in CATALOGs case because of the composition of the inks we use to encode DNA) or perhaps a pellet of DNA for long term storage.   There is great utility to having the storage be in liquid form because it presents the opportunity to find “records” in the file directly:   we can create probes which, once inserted into the file, will find the targeted record or datum directly.   

6. I asked Catalog one question last year and it was “how much will it cost?” Do we have an answer right now that we can share? What sort of storage density are we looking at, and what sort of cost per stored PB or TB?

The first commercialization option for DNA storage, followed by DNA-based computation, will likely be delivered as a service. We will announce pricing models a bit closer to the availability of that offering.  The objective is to be approximately equal to conventional storage but to express value by virtue of dramatic improvements in areal density (a million times more dense than electronic media), effectively infinite longevity, and the avoidance of technology obsolescence:   DNA written today will be readable at any time in the future because DNA does not change:  there are no such issues such as firmware, OS, or device upgrades that are concerning.  

7. Right now, what are the biggest obstacles to the rapid development of the storage/computational capabilities of DNA and what’s being done to solve them

Right now the obstacles are engineering in nature and focus on matters clients have viewed as consistently important with respect to any computational technology:  reliability, price performance, availability, consistency and so on.   We have a dedicated team of engineers, chemists and computer scientists sorting through each of these issues to create the kind of value metrics clients are accustomed to.  This includes miniaturization of the current machine, the expansion of automation covering the entire process, and the design and implementation of software infrastructure and tooling desired by clients.  

8. What are the current solutions being looked at to solve the throughput problem (e.g. 10MBps written is only 26TB per month).

he current throughput attributes of Shannon are meant to help CATALOG better understand limiting impacts of design choices we made on the machine including the implication of scaling the chemistry underlying our encoding and computational models.  We can adjust the throughput by changing some of the performance parameters on the current system and this would have an impact of  a few orders of magnitude.   But we have begun to lay out other design choices that could go quite far beyond even that improvement.   For example, the addition of incremental ink jet print heads has an exponential impact on the throughput of the machine.  This is just one example of many adjustments or design choices available to us.  

Desire Athow
Managing Editor, TechRadar Pro

Désiré has been musing and writing about technology during a career spanning four decades. He dabbled in website builders and web hosting when DHTML and frames were in vogue and started narrating about the impact of technology on society just before the start of the Y2K hysteria at the turn of the last millennium.

Read more
CATALOG DNA book
World's first ever 'DNA book' sells for $65, is like a silver bullet and is 500KB in size; shame you can't actually read it
Molecular hard drive
Chinese researchers are looking to create a revolutionary type of hard drive based on organic materials but huge unknowns remain
The best free DVD rippers
About 25,000 Blu-ray movies exist; here’s how I could store an entire collection of these shiny coasters in a small suitcase
Western Digital HDD
Beyond 100TB, here's how Western Digital is betting on heat dot magnetic recording to reach the storage skies
A crystal used in the study charges under UV light. The process created by Zhong Lab could be used with a variety of materials, taking advantage of rare earths’ powerful, flexible optical properties
Scientists stored data in rare-earth crystal which could one day delivery terabyte-class storage the size of a small grain of rice
5D memory crystal
'Eternal' 5D memory crystal capable of storing 360 TB of data for billions of years now holds a full human genome
Latest in Pro
An image of network security icons for a network encircling a digital blue earth.
Why multi-CDNs are going to shake up 2025
A stylized depiction of a padlocked WiFi symbol sitting in the centre of an interlocking vault.
Broadcom warns of worrying security flaws affecting VMware tools
URL phishing
HaveIBeenPwned owner suffers phishing attack that stole his Mailchimp mailing list
Ransomware
Cl0p resurgence drives ransomware attacks to new highs in 2025
Millwall FC The Den
The UK's first football club mobile network is here - but you probably won't guess which team has launched it
A person using a smartphone with a cybersecurity lock symbol appearing over it.
The growing threat of device code phishing and how to defend against It
Latest in News
A stylized depiction of a padlocked WiFi symbol sitting in the centre of an interlocking vault.
Broadcom warns of worrying security flaws affecting VMware tools
Microsoft Surface Laptop and Surface Pro devices on a table.
Hate Windows 11’s search? Microsoft is fixing it with AI, and that almost makes me want to buy a Copilot+ PC
Oura Ring 4
Activity tracking on Oura Ring is about to get a whole lot better, but I've got bad news about your step count
Google Pixel Buds Pro 2
Cleaned your Pixel Buds Pro 2 recently? If not, you might be getting worse sound
Google Maps on a phone being held in someone's hand
Google Maps is getting two key upgrades, for easier route planning and quicker access to Gemini AI
URL phishing
HaveIBeenPwned owner suffers phishing attack that stole his Mailchimp mailing list