Using Generative AI or Chatbots with Multimodal Projects

Overview

Generative AI is a type of artificial intelligence that creates various forms of media, such as text or images when given a prompt. GenAI learns from data and makes predictions based on that data and its patterns when it’s asked questions or tasked with a problem.

GenAI technology can be a means to enhance our learning and creativity. However, it’s important to consider ethics, an instructor’s course policies, and your own learning goals when using these tools.

This writing guide will focus on using GenAI with multimodal text. See our guide "How Can I Use Generative AI or Chatbots In The Writing Process?" on using chatbots (ChatGPT, UM-GPT, Bard, Bing) and alphabetic texts.

What are multimodal texts?

Multimodal texts go beyond or include alphabetic text and may include still and moving images, animations, color, music, and sound. These texts use multiple modes to communicate, create arguments, and engage audiences. The modes include the visual, linguistic (alphabetic text and spoken word), spatial (dealing with space and how other modes are organized, emphasized, and contrasted), gestural (gesture and movement, such as with a public speaker moving their hands and walking), and aural (what an audience can and cannot hear, such as music, silence, and podcasts).

How can I use GenAI to create multimodal texts?

For multimodal projects, GenAI tools can help generate images, video, audio, and more.

Image generators

Tools like Firefly, DALL-E, Midjourney, and Stable Diffusion can generate images from text prompts you provide.

Especially when using image generators, be aware of ethical issues such as copyright and artist compensation. Some of the images some GenAI tools learn from are from artists who haven’t been compensated or credited for their work. On the other hand, tools such as Shutterstock, for example, have systems in place that compensate artists if users pay a small fee for images without a watermark.

Like text-based GenAI tools, image generation tools are often biased. After generating a prompt, especially a prompt that will generate images of people, critique the image and wonder if it’s further perpetuating stereotypes or bias. Tools like the Stable Diffusion Bias Explorer can be a way of identifying limitations and misrepresentations.

Presentation and video tools

Slidesai.io is a Google Slides add-on that allows you to insert text, and then offers visually appealing presentation slides that you can tinker with.
Canva’s suite of tools offers many capabilities, from slide templates, design elements, and background removers for video
Speechify is a text-to-speech tool—you can input text to receive voiceover audio files with a wide array of voice options
ChartGPT, not to be confused with ChatGPT, creates graphs and charts to show data after text prompts are put into it

Resources and Reminders

More information regarding the University of Michigan’s GenAI offerings, including UM-GPT, as well as information regarding GenAI’s limitations, privacy risks, reputable sources, and ethics can be found on the U-M Guidance for Students page.

Remember that GenAI tools are not always accurate and most likely will require your input and revisions. Think about having control over what the GenAI tool outputs, not the other way around.

There are a vast array of websites out there—we’ve listed those that have some capabilities in their free/low-cost versions. Please read the user and licensing agreements before use. More popular tools not listed here can be found on U-M's GenAI resource page.

Search: {{$root.lsaSearchQuery.q}}, Page {{$root.page}}

for