HomeTechnologyArtificial IntelligenceTurning senses into media with Artificial Intelligence to Perceive

    Turning senses into media with Artificial Intelligence to Perceive

    Humans perceive the world through different senses: we see, feel, hear, taste and smell. The different senses with which we perceive are multiple channels of information, also known as multimodal. Does this mean that what we perceive can be seen as multimedia?

    Xue Wang, Ph.D. Candidate at LIACS translates perception into multimedia and uses Artificial Intelligence (AI) to extract information from multimodal processes, similar to how the brain processes information. In her research, she has tested the learning processes of AI in four different ways.

    Putting words into vectors

    First, Xue looked into word-embedded learning: the translation of words into vectors. A vector is a quantity with two properties, namely a direction and a magnitude. Specifically, this part deals with how the classification of information can be improved. Xue proposed the use of a new AI model that links words to images, making it easier to classify words. While testing the model, an observer could interfere if the Artificial Intelligence (AI) did something wrong. The research shows that this model performs better than a previously used model.

    Looking at sub-categories

    A second focus of the research is images accompanied by other information. For this topic, Xue observed the potential of labeling sub-categories, also known as fine-grained labeling. She used a specific AI model to make it easier to categorize images with little text around them. It merges coarse labels, which are general categories, with fine-grained labels, the sub-categories. The approach is effective and helpful in structuring easy and difficult categorizations.

    Finding relations between images and text

    Thirdly, Xue researched image and text association. A problem with this topic is that the transformation of this information is not linear, which means that it can be difficult to measure. Xue found a potential solution for this problem: she used kernel-based transformation. Kernel stands for a specific class of algorithms in machine learning. With the used model, it is now possible for AI to see the relationship of meaning between images and text.

    Finding contrast in images and text

    Lastly, Xue focused on images accompanied by text. In this part, AI had to look at contrasts between words and images. The AI model did a task called phrase grounding, which is the linking of nouns in image captions to parts of the image. There was no observer that could interfere in this task. The research showed that AI can link image regions to nouns with an average accuracy for this field of research.

    The perception of artificial intelligence

    This research offers a great contribution to the field of multimedia information: we see that AI can classify words, categorize images, and link images to text. Further research can make use of the methods proposed by Xue and will hopefully lead to even better insights into the multimedia perception of AI.

    ELE Times Research Desk
    ELE Times Research Deskhttps://www.eletimes.ai
    ELE Times provides extensive global coverage of Electronics, Technology and the Market. In addition to providing in-depth articles, ELE Times attracts the industry’s largest, qualified and highly engaged audiences, who appreciate our timely, relevant content and popular formats. ELE Times helps you build experience, drive traffic, communicate your contributions to the right audience, generate leads and market your products favourably.

    Related News

    Must Read

    New LX4580 – Highly Integrated 24‑Channel Mixed‑Signal IC for Aviation & Defence Actuation Systems

    Microchip Technology announces the LX4580, a 24‑channel mixed‑signal IC designed...

    TI redoubles advancement of next-gen physical AI with NVIDIA

    Texas Instruments announced accelerating the safe deployment of humanoid...

    Everspin Advances High-Reliability xSPI MRAM Portfolio With Complete Production Qualification for 64Mb MRAM

    Everspin Technologies, the world’s leading developer and manufacturer of...

    R&S acquires SRS, specialists in SDR communications solutions

    Rohde & Schwarz acquired Software Radio Systems (SRS), a...

    Differentiating Between LPDDR6, LPDDR5, and LPDDR5X

    Courtesy: Synopsys Advances in memory standards are driving faster and...

    Arrow Electronics and Infineon introduce 240W USB-C PD 3.2 reference design for battery-powered motor control applications

    Arrow Electronics and Infineon Technologies AG have announced REF_ARIF240GaN, a...

    Robotics Engineering: The Architectural Evolution Behind IT–OT Convergence

    Factories today operate as dense mechanical ecosystems, whether in...

    How AI Is Transforming Network Protocol Testing in Software-Defined Networks?

    As enterprises accelerate toward cloud-native infrastructure, edge computing, and...

    What is Fashion Tech? Providing New Product Value and Customer Experiences with Technology

    Courtesy: Murata Electronics What is fashion tech? - diverse technologies...