AI powers video analytics for e-commerce.

Artificial intelligence (AI) is reducing the noise in online videos, helping shoppers find the information they need faster.

Researchers at MIT and IBM have developed an AI method that can help viewers navigate directly to the most relevant parts of a video. At the same time, Video Summarizer AI and Mindstamp focus on providing interactive and multilingual summaries of educational videos to improve learning productivity and accessibility.

“By feeding the audio transcript for a video into AI and augmenting that AI with additional metadata, viewers can 'conversate' with the video resulting in instant answers to their questions and relevant content. There are direct dynamic links to.” Brett LindenbergCEO and Founder Mind StampA software company that creates interactive videos told PMNTS.

PYMNTS Reported earlier, that as Amazon and Walmart look to increase sales through content, Amazon Live has launched an interactive, shoppable channel called FAST Channel on Prime Video and Amazon Freevee. The channel allows viewers to purchase and engage with the content they watch on their TVs using their mobile devices.

Dealing with challenges

MIT researchers have. Created a new perspective Teaching AI models to do spatio-temporal grounding, which involves identifying the start and end times of specific actions within a video. Traditional methods for this task require extensive human interpretation, which can be expensive, time-consuming, and subjective. The challenge lies in defining the precise boundaries of a process, such as deciding when the process of “baking a pancake” begins—is it when the chef starts mixing the batter or when the batter is placed in the pan? is put?

The MIT team uses unlabeled instructional videos and text transcripts from websites like YouTube as training data to overcome these problems. The training process is divided into two parts: First, a machine learning model is trained to understand which actions occur at specific times throughout the video, creating a global representation. Second, the model is trained to focus on specific regions where action occurs, creating a spatial representation. This allows the model to focus on relevant objects and actions rather than the entire scene.

The researchers also added an additional component to reduce misunderstandings between the narration and the video, such as when a chef talks about a process before performing it. To develop a more realistic solution, they focus on cropped videos that span several minutes, unlike most AI techniques that train using cropped clips of a few seconds. In which only one process is shown.

Evaluating their approach required the MIT researchers to create a new benchmark dataset using a new annotation technique that effectively identifies multiple measures. Instead of drawing boxes around important objects, users mark the intersection of objects, such as when the edge of a knife cuts through a tomato. This innovative method enables the model to learn from more natural, uncut videos and accurately identify the start and end times of complex actions.

The MIT team's approach has important implications for domains ranging from e-commerce to education. By eliminating the need for expensive and time-consuming human interpretation, their method enables AI models to learn from a wide array of unlabeled instructional videos, making the training process more efficient and allowing the models to perform a variety of tasks and Allows generalization across domains.

In e-commerce, this technology can help shoppers quickly find the information they need in product videos, such as demonstrations of specific features or assembly instructions. By identifying critical moments within a video, the AI ​​model can provide users with links to relevant content, which can enhance the overall shopping experience.

Video summary for education

Video Summarizer AI and Mindstamp are focusing on educational video content by providing multilingual and interactive summaries aimed at improving learning productivity and accessibility.

Klym Zhuravlov-Iuzefovych, creator of Video Summarizer AI, described that it enables increased productivity of video-based learning, allowing students to interact with video lectures in their native language, potentially overcoming language barriers and promoting engagement Is.

Mindstamp's AI-powered platform aims to create interactive elements within video. “Using AI to analyze videos, AI can create a series of interactive elements within the video, including questions to confirm understanding, third-party content to add additional insights,” Lindenberg explained. With links to data sources, links to further AI explanations of topics, and more, the video effectively becomes an interactive educational or training resource.

Additionally, Lindenberg notes that “AI can identify key pieces of video and dynamically create chapters, links, references and branching between videos,” which can further enhance the educational value of video content. .

Video Summarizer AI is built on a custom GPT (Generative Pre-trend Transformer) model, designed to understand and summarize complex and lengthy educational content across different subjects and academic levels. The tool's integration with the ChatGPT interface and OpenAI's technology offers a seamless user experience on desktop and mobile devices.

In addition to procurement and education, MIT Research and tools like MindStamp can streamline video-based employee training and telemedicine. As video content becomes increasingly central to online life, AI innovations from MIT, IBM, Video Summarizer AI, and Mindstamp can impact customer experience, learning productivity, and engagement.

Although these technologies show potential, it is important to view claims about their effectiveness with caution and to fully understand their impact on e-commerce, education, and other domains. Wait for the test. As these technologies evolve and integrate, they could create a new era of user-friendly, efficient, and comprehensive video-based experiences across industries. However, more evidence is needed to substantiate these claims and determine the extent of their effects.

