Amazon unveils surprise new video and image AI models to compete with the best on the market
New Amazon video and image generation models unveiled
- Amazon unveils new image and video creation AI tools
- Amazon Nova Canvas and Nova Reel look to help ecommerce sellers
- Both new Nova models available now on Bedrock
Amazon has announced new image and video generation models as it steps up its fight to become an AI heavyweight.
The company unveiled Amazon Nova Canvas and Nova Reel at its AWS re:Invent 2024 event in Las Vegas, with CEO Andy Jassy revealing the launch as part of a new Nova series of AI models.
Both new models are on Bedrock now, with the launches set to take Amazon into direct competition with the likes of OpenAI and Grok when it comes to image and video creation.
Amazon Nova Canvas and Reel
The new models look to initially target sellers and other users on Amazon's ecommerce platform, allowing them to quickly and cheaply create media content to enrich their pages.
Amazon didn't reveal too much in the way of specifics when it came to the new offerings, but did reveal Nova Canvas will allow users to create and edit images using natural language text inputs, and Nova Reel can provide "studio-quality" video, with features such as camera motion control, 360-degree rotation, and zoom.
In a blog post announcing the news, the company noted that customers on its Amazon Ads platform using the new models advertised five times more products and twice as many images per advertised product, widening their reach to buyers across the globe.
Looking forward, Jassy also revealed Amazon will be launching a Speech-to-Speech generation model in early 2025, followed by an "Any-to-Any" model in mid-2025.
Are you a pro? Subscribe to our newsletter
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
The former will be able to analyse and understand streaming speech input in natural language, with the ability to interpret verbal and nonverbal cues such as tone and cadence, to reply in a natural, human-esque way.
The latter, which Jassy described as a true multimodal to multimodal model, will be able to take in text, images, audio, and video, before outputting in whichever mode is required.
You may also like
- AI reckons it can do all jobs, even those thought previously 'safe'
- We’ve listed the best AI writers around today
- Check out our roundup of the best productivity tools
Mike Moore is Deputy Editor at TechRadar Pro. He has worked as a B2B and B2C tech journalist for nearly a decade, including at one of the UK's leading national newspapers and fellow Future title ITProPortal, and when he's not keeping track of all the latest enterprise and workplace trends, can most likely be found watching, following or taking part in some kind of sport.