Connect with us

Artificial Intelligence

Google’s VideoPoet Multimodal Model Creates Both Video and Audio

Published

on

Google researchers introduced VideoPoet, a sophisticated language model capable of processing multimodal inputs, including text, images, videos, and audio, to produce videos. VideoPoet employs a decoder-only transformer architecture, which operates in a zero-shot manner, enabling it to generate content for tasks it hasn’t specifically trained on. The training process consists of two steps, mirroring the approach of large language models (LLMs): pretraining and task-specific adaptation. The pre-trained LLM serves as a versatile foundation that can be fine-tuned for various video generation tasks, as explained by the researchers.

In contrast to competing video models utilizing diffusion models, which introduce noise to training data and subsequently reconstruct it, VideoPoet consolidates numerous video generation capabilities into a unified large language model (LLM). Unlike models with separately trained components for specific tasks, VideoPoet seamlessly integrates various video generation functionalities.

Its capabilities encompass text-to-video, image-to-video, video stylization, video inpainting and outpainting, as well as video-to-audio generation. VideoPoet, an autoregressive model, generates output by referencing its previously generated content. It undergoes training in video, text, image, and audio, employing tokenizers to facilitate the conversion of input between various modalities.

“Our results suggest the promising potential of LLMs in the field of video generation,” the researchers said. “For future directions, our framework should be able to support ‘any-to-any’ generation, e.g., extending to text-to-audio, audio-to-video, and video captioning should be possible, among many others.”

Text to video
Text prompt: Two pandas playing cards

Image to video with text prompts
Text prompt accompanying the images (from left):

  1. A ship navigating the rough seas, thunderstorms and lightning, animated oil on canvas
  2. Flying through a nebula with many twinkling stars
  3. A wanderer on a cliff with a cane looking down at the swirling sea fog below on a windy day

Image (left) and video generated (immediate right)

Zero-shot video stylization
VideoPoet can modify a pre-existing video based on text prompts.

In the provided examples, the original video is on the left, while the stylized version is immediately adjacent to it. From left to right: A wombat wearing sunglasses and holding a beach ball on a sunny beach; teddy bears gracefully ice skating on a crystal clear frozen lake; a metal lion roaring in the radiant light of a forge.

Video to audio
Initially, the researchers created 2-second video clips, and VideoPoet autonomously predicted the corresponding audio without relying on any text prompts.

Moreover, VideoPoet can craft a brief film by assembling multiple short clips. The researchers initiated the process by requesting Bard, Google’s alternative to ChatGPT, to draft a short screenplay using prompts. Subsequently, they generated video content based on these prompts and amalgamated all elements to produce the final short film.

Longer videos, editing and camera motion
Google stated that VideoPoet addresses the challenge of generating longer videos by conditioning the last second of videos to predict the subsequent second. They explained, “By chaining this process repeatedly, we demonstrate that the model not only effectively extends the video but also maintains the visual fidelity of all objects consistently across multiple iterations.”

Additionally, VideoPoet possesses the ability to manipulate the movement of objects in existing videos. For instance, a video featuring the Mona Lisa can be prompted to showcase the act of yawning. Utilizing text prompts can also facilitate alterations in camera angles within pre-existing images.

To illustrate, the initial image was generated with the following prompt: “Adventure game concept art of a sunrise over a snowy mountain by a crystal clear river.”

Subsequently, additional prompts were applied in sequence from left to right: “Zoom out,” “Dolly zoom,” “Pan left,” “Arc shot,” “Crane shot,” and “FPV drone shot.”

Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Artificial Intelligence

Samsung and Google Cloud Bring Gen AI to Samsung Galaxy S24 Series

Published

on

Samsung Electronics and Google Cloud have announced a new multi-year partnership to bring Google Cloud’s generative artificial intelligence(AI) technology to Samsung smartphone users around the globe. Starting with the Samsung Galaxy S24 series announced today at Galaxy Unpacked in San Jose, California, Samsung will be the first Google Cloud partner to deploy Gemini Pro and Imagen 2 on Vertex AI via the cloud to their smartphone devices.

 

View this post on Instagram

 

A post shared by Gadget Voize (@gadgetvoize)

“Google and Samsung have long shared deeply held values around the importance of making technology more helpful and accessible for everyone. We’re thrilled that the Galaxy S24 series is the first smartphone equipped with Gemini Pro and Imagen 2 on Vertex AI,” said Janghyun Yoon, Corporate EVP and Head of Software Office of Mobile eXperience Business at Samsung Electronics. “After months of rigorous testing and competitive evaluation, the Google Cloud and Samsung teams worked together to deliver the best Gemini-powered AI experience on Galaxy.”

Samsung is the first Google Cloud partner to deploy Gemini Pro on Vertex AI to consumers. Built from the ground up to be multimodal, Gemini can generalize and seamlessly understand, operate across, and combine different types of information including text, code, images, and video. Starting with Samsung-native applications, users can take advantage of summarisation feature across Notes, Voice Recorder, and Keyboard. Gemini Pro on Vertex AI provides Samsung with critical Google Cloud features, including security, safety, privacy, and data compliance.

Galaxy S24 series users can also immediately benefit from Imagen 2, Google’s most advanced text-to-image diffusion technology from Google DeepMind to date. With Imagen 2 on Vertex AI Samsung can bring safe and intuitive photo-editing capabilities into the users’ hands. These features can be found in Generative Edit in S24’s Gallery application. As part of this partnership, Samsung is also one of the first customers to test Gemini Ultra, Google’s most capable and largest model for highly complex tasks. The S24 series will also use Gemini Nano, an on-device LLM delivered as part of the Android 14 operating system, the most efficient model of Gemini for on-device tasks.

“Together with Samsung, Google Cloud sees the tremendous opportunity for generative AI to create meaningful mobile experiences that stimulate and strengthen connection and communication for millions,” said Thomas Kurian, CEO, Google Cloud. “With Gemini, Samsung’s developers can leverage Google Cloud’s world-class infrastructure, cutting-edge performance, and flexibility to deliver safe, reliable, and engaging generative AI-powered applications on Samsung smartphone devices.”

Continue Reading

Artificial Intelligence

Acer Launches New Swift Go AI PCs with Intel Core Ultra Processors

Published

on

Acer has expanded its Swift family of thin and light laptops with new Intel Core Ultra processors featuring Intel’s first neural processing unit (NPU) and integrated AI acceleration capabilities. Now even more performance-minded, capable, and intuitive for content creation, schoolwork, productivity, and play, the new Swift laptops’ powerful processing and AI-supported features further the laptop’s usability.

“After unveiling our first Intel Core Ultra laptops last month, we’re debuting even more products in our Swift line to help a wider range of customers take advantage of premium laptop experiences and AI-supported technology for more exciting and effective PC use,” said James Lin, General Manager, Notebooks, IT Products Business, Acer. “Plus, these laptops feature impressive updates that help customers do more – and do them even better.”

“Through our deep, technical collaboration with Acer, we are building beyond the CPU focusing on power efficiency, graphics, and AI usages. The Acer Swift Go and Acer Swift X 14 laptops are excellent examples of AI PCs being delivered to market, powered by Intel Core Ultra processors and featuring an all-new NPU to enable AI on clients. We are excited to have our customers experience the enhanced collaboration, productivity, and creativity these AI PCs deliver,” said Jim Johnson, Senior Vice President and General Manager of the Client Business Group, Intel.

Acer Swift Go Laptops – The Latest Tech with OLED Displays, Wi-Fi 7 and AI Features
Certified as Intel Evo platform laptops, the new Acer Swift Go 16 (SFG16-72) and Swift Go 14 (SFG14-73) are powered by new Intel Core Ultra processors and Intel Arc built-in GPUs, delivering the in-demand pairing of premium performance and all-day battery life of up to 12.5 hours for the 14-inch laptop and up to 10.5 hours for the 16-inch version.

Other improvements to enhance user experiences and creativity with AI-supported features make the Swift Go laptop lines even more attractive. Built for the next wave of computing, users can access Copilot in Windows with one click of the laptop’s dedicated Copilot key and harness the power of AI to optimize time spent when working, creating, and playing on the device. Both AI PCs also showcase a 1440p QHD webcam with TNR, resulting in higher-quality video, paired with Acer PurifiedView’s AI-boosted conferencing features including Background Blur, Automatic Framing, and Eye Contact.

The webcam is complemented by Acer PurifiedVoice 2.0 technology with AI noise reduction, which pairs with the three microphones to capture crisp, clear audio and reduce background noise and voices beyond that of the speaker. The addition of Intel Wi-Fi 7 delivers internet connection speeds up to 2.4x faster than Wi-Fi 6E and keeps the laptops connected when it matters most.

Both laptops highlight the Swift line’s thin-and-light aluminium chassis, which can be opened to 180 degrees, so they lay flat for easy collaboration. The Swift Go 14 features an option for a multi-control lighting touchpad on certain models that enables direct media commands, and an OceanGlass touchpad that further ensures smooth and productive scrolling experiences. Both touchpads also boast 44% larger scrolling space to support on-the-go lifestyles and mouse-free scrolling. Plus, the Swift Go models now feature the latest Intel Unison 2.0, which enables a fast and easy connection between the laptop and the customer’s Android or iOS devices.

Like their predecessors, the new Swift Go devices feature vibrant, colour-rich OLED displays with 500-nit peak brightness, 100% DCI-P3 colour range, and DisplayHDR True Black 500 certification. Images are vivid and detailed on the Swift Go 16’s 16-inch 3.2K OLED display with a 3200 x 2000 resolution and 120 Hz refresh rate, while the Swift Go 14 presents a 14-inch 2.8K OLED display with a 2880 x 1800 resolution and a 90 Hz refresh rate. Both models include TÜV Rheinland Eyesafe display certification to help reduce the effects of eye strain. Plus, they feature the option for WUXGA touchscreen displays to enable touch and pen input, especially helpful for note-taking and sketching.

In addition, the Swift Go laptops now support up to 32GB LPDDR5X memory and up to an upgradable 2TB PCIe Gen 4 SSD with dual slots. Finally, the devices carry the latest port selection to connect to peripherals and displays including dual USB Type-C ports with Thunderbolt 4 –both offering fast charging, a USB Type-A port with offline charging, HDMI 2.1, and a microSD card reader. The laptops also include Bluetooth LE Audio for improved wireless audio.

Acer Swift X 14 – AI PC with Calman-Verified Display for Creators and Students
The new Acer Swift X 14 (SFX14-72G) extends the tradition of empowering a variety of users with powerful combinations and amazing displays, now with the latest Intel Core Ultra processors, NVIDIA GeForce RTX 40 Series Laptop GPUs, and a newly Calman-verified 2.8K OLED panel. Powered by new Intel Core Ultra H-Series processors and up to NVIDIA GeForce RTX 4070 Laptop GPUs, the premium laptop enables faster AI-enhanced workflows for higher-quality live streaming, video editing, and 3D rendering.

NVIDIA DLSS 3.5 technology brings AI-powered ray reconstruction for upscaled lighting effects and resolution in graphics-heavy games and applications. Plus, the Swift X 14 is NVIDIA Studio-validated, optimised with pre-installed NVIDIA Studio Drivers, and includes Copilot in Windows built-in (with a dedicated Copilot key) that harnesses the power of AI to work alongside users when navigating their devices and applications.

The Calman-verified 14.5-inch 2.8K OLED display ensures impressive colour accuracy with Delta E<2 in animation, photos, and video so that creators and professionals can bring their designs to life. Bold colours combine with incredible clarity on the OLED display thanks to its 100% DCI-P3 colour range capabilities, high-contrast rating, and 500-nit peak brightness with VESA DisplayHDR TrueBlack 500 certification. The display features Acer Light Sensing technology, which vividly adapts colour temperature and brightness to reflect lighting conditions. Finally, the 120 Hz frame refresh rate ensures smooth, elevated video and entertainment viewing experiences.

In addition to content creation, the Swift X 14 is ready for productivity to help accomplish work and school tasks. Conferencing is seamless and effective with the combination of an FHD webcam that features AI-enabled Temporal Noise Reduction (TNR), as well as Acer PurifiedView and PurifiedVoice 2.0 with AI Noise Reduction technology, with three microphones for enhanced audio capture and crisp visual and audio clarity. The latest ports and connectivity – dual USB Type-C ports, an HDMI 2.1 port, and a MicroSD card reader, as well as Wi-Fi 6E and Bluetooth LE Audio – keep the devices functional and connected while advanced thermal technologies – a large fan, dual D6 copper heat pipes, and dedicated air inlet keyboard – help keep the laptop cool. The Swift X 14 is also well-equipped with up to 32 GB LPDDR5X memory and up to 1 TB PCIe Gen 4 SSD.

The Acer Swift Go 16 (SFG16-72/T) will be available in EMEA in February, starting at $1,149. The Acer Swift Go 14 (SFG14-73/T​) will be available in EMEA in February, starting at $1,099. The Acer Swift X 14 (SFX14-72G​) will be available in EMEA in February, starting at $1,799.

Continue Reading

Artificial Intelligence

Character.ai: Young Generation Welcomes AI Therapy Chatbots

Published

on

In an unexpected turn of events, millions of users on Character.ai, a renowned platform for crafting chatbots, are now seeking solace in AI therapist bots for mental health support.

Among the array of fictional and real personas on the platform, the standout character is “Psychologist,” accumulating an astonishing 78 million messages, with 18 million exchanged since November alone.

The brain behind the Psychologist bot is Sam Zaia, a psychology student from New Zealand, who expressed initial surprise at its widespread popularity. Originally designed for personal use during moments when friends were unavailable, and traditional therapy was financially impractical, users have lauded the bot for providing comfort and assistance in navigating emotions, with one hailing it as a “lifesaver.”

Character.ai, boasting 20 million registered users and a daily visitor count of 3.5 million, primarily attracts individuals aged 16 to 30. While the platform is populated by various characters, mental health-related bots like “Therapist” and “Are you feeling OK?” have garnered millions of messages, signalling a growing interest in AI-powered mental health support.

Despite the platform’s entertainment and role-playing focus, the success of mental health bots prompts questions about their efficacy. Psychotherapist Theresa Plewman, after trying the Psychologist bot, expressed reservations about its ability to make swift assumptions and provide advice compared to a human therapist. However, she acknowledged that the immediacy of AI bots could be valuable for those in urgent need.

Sam Zaia, the creator of the Psychologist bot, recognizes that AI cannot fully replace human therapists currently but is intrigued by technology’s potential. He is engaged in a post-graduate research project exploring the emerging trend of AI therapy and its appeal to young people.

Critics raise concerns about AI bots potentially lacking comprehensive information gathering and providing biased or inadequate advice. The company behind Character.ai, while acknowledging user support for these AI characters, underscores the importance of consulting certified professionals for legitimate advice and guidance.

The surge of AI therapist bots on Character.ai mirrors a broader trend in the digital landscape, where technology is increasingly utilized to address mental health challenges. While some caution against potential drawbacks, others view these bots as valuable tools to cope with the strain on public mental health services.

Continue Reading
Advertisement

Latest Reviews

Follow us on Facebook