Home Big Data Enhancing Podcast Accessibility: A Information to LLM Textual content Highlighting

Enhancing Podcast Accessibility: A Information to LLM Textual content Highlighting

0
Enhancing Podcast Accessibility: A Information to LLM Textual content Highlighting

[ad_1]

Introduction

Think about loving a podcast and wishing to recollect the most effective bits, however it’s all sound, no textual content. What do you do? That’s the place cool instruments like LLMs and Audio-to-Textual content translators step in. They magically flip spoken phrases into written notes, letting you simply pick the gems and create helpful bullet factors. So, your favourite podcast moments are only a transcription away! Since its first debut in November 2022, LLM has been all the craze.  LLM can be utilized for varied duties, and textual content summarization is an important software. We will have summarization to many different modes aside from textual content, reminiscent of audio & video. We will use LLM to reinforce podcast accessibility and generate bulleted highlights for ease of use or take notes for future reference.

PaLM (Pathways Language LLM) is a crucial LLM established by Google AI final yr in April 2022. This yr, in March 2023, PaLM 2’s second model was launched, an improved and extra up to date model. It’s meant to have superior bilingual, coding, and pondering skills. The benefit of utilizing PaLM 2 LLM API over different LLMs is that its API is freely out there. Not like OpenAI’s ChatGPT, it performs higher and has improved reasoning skills than different LLMs.

On this article, we can be studying learn how to use these instruments, specifically PaLM 2 API and Maker Suite, to create a easy Podcast Textual content Highlighter and learn to optimize the settings of the LLM mannequin to generate better-bulleted summaries. Study the options of those instruments and attempt to perceive totally different use circumstances the place they can be utilized. So let’s get began!

Studying Targets

  • Perceive the PaLM Mannequin and options.
  • Study concerning the mannequin settings of PaLM.
  • Implement a Python mission that generates a bulleted abstract of a Podcast audio.

This text was printed as part of the Information Science Blogathon.

Overview of PaLM 2

PALM 2 is a large NN mannequin with 540 billion parameters, which is scaled utilizing the Pathways methodology to realize breakthrough efficiency. PaLM 540B outperforms the present cutting-edge on quite a lot of multi-step reasoning duties and outperforms common human efficiency on the just-released BIG-bench benchmark, attaining breakthrough efficiency. It learns the connection between phrases and phrases and may use this data for various duties.

Pathways AI Structure

Pathways is a brand new manner of AI structure pondering that addresses lots of the weaknesses of present methods. Machine studying fashions are likely to overspecialize at single duties once they might excel at many.  Under are the underlying ideas of this structure:

  • A number of Job: The fundamental thought is as an alternative of coaching 1000’s of various fashions to do totally different duties from scratch. We use the identical mannequin and attempt to prolong its capabilities to carry out new duties much like how people method doing any process.
  • Multimodal: Pathways might concurrently allow multimodal fashions encompassing imaginative and prescient, auditory, and language understanding. So whether or not the mannequin is processing the phrase “leopard,” the sound of somebody saying “leopard,” or a video of a leopard operating the identical response is activated internally: the idea of a leopard. The result’s a extra insightful mannequin and fewer susceptible to errors and biases.
  • Sparse and Environment friendly: We could create a single mannequin that’s “sparsely” energetic, which signifies that just a few channels inside the community are activated as wanted. In actuality, the mannequin dynamically learns which parts of the community are good at specific jobs – it is aware of learn how to route duties by means of probably the most crucial sections of the mannequin. As a result of we don’t interact the whole community for each exercise, the sort of structure not solely has a extra extraordinary skill to be taught a spread of duties, however it’s additionally quicker and way more power environment friendly.

PALM 2 Options

Palm 2 has been skilled in over 100 languages and may move language proficiency exams on the knowledgeable stage. It’s the second largest mannequin in parameter measurement; the primary is GPT-4 with 1 trillion parameters. It has extremely environment friendly coaching on 6k chips(TPU v4) throughout 2 pods or clusters. PaLM makes use of a typical Transformer mannequin structure in a decoder-only.

SwiGLU Activations

It’s utilized in intermediate MLP layers, which have a greater efficiency high quality than ReLU, GeLU or Swish. SwiGLU activations are extra environment friendly than conventional activation capabilities, and so they additionally assist enhance LLMs’ stability. SwiGLU makes use of a gating mechanism, which permits it to activate neurons primarily based on the enter it receives selectively. This can assist to scale back overfitting and enhance generalization.  The SwiGLU activation perform is a piecewise linear perform that’s outlined as follows:

SwiGLU(x) = max(x, 0) + min(α(x – ReLU(x)), 0)

the place x is the enter to the perform, ReLU(x) is the rectified linear unit perform (i.e., max(x, 0)), and α is a tunable parameter that controls the form of the detrimental a part of the perform.

The SwiGLU activation perform is designed to handle a number of the limitations of the ReLU perform, which may end up in “useless” neurons that don’t contribute to the output of a neural community. By introducing a piecewise linear detrimental slope, the SwiGLU perform can assist to forestall this drawback and enhance the efficiency of neural networks.

 Difference between ReLU & SwiGLU | Podcast Accessibility | LLM Text Highlighting
Distinction between ReLU & SwiGLU – Supply

Parallel Formations

A parallel formulation is utilized in each transformer block as an alternative of the serialized one utilized in the usual formulation. The parallel formulation permits 15% quicker coaching at bigger scales. Parallel formulation is a brand new manner of coaching LLMs that permits them to be skilled a lot quicker than conventional LLMs. Conventional LLMs are skilled on a single GPU, which might be extended. Parallel formulation will allow LLMs to be skilled on a number of GPUs concurrently, considerably dashing up the coaching course of. Right here is an instance of how parallel formulation works. Think about that we’ve got an LLM that’s skilled on a single GPU. The LLM has a vocabulary of 10,000 phrases, and a vector of 100 dimensions represents every phrase. The LLM is skilled on a dataset of 1 million sentences.

We have to iterate over the dataset and replace the LLM’s parameters for every sentence to coach the LLM. This course of might be extended, particularly if the dataset is giant. With parallel formulation, we are able to put together the LLM on a number of GPUs concurrently. We will divide the dataset into 1000 batches, and every batch might be skilled on a separate GPU. This considerably hastens the coaching course of as a result of we are able to concurrently put together the LLM on 1000 batches.

Multi-Question Consideration

The important thing/worth is shared for every head as an alternative of only one, which ends up in price financial savings at autoregressive decoding time. We will say that in multi-head consideration, the complete consideration computation is replicated h instances, whereas, in multi-query consideration, every “head” of the question worth Q has the identical Okay and V transformation utilized to it. The quantity of computation carried out by incremental MQA is much like that of incremental MHA. The crucial distinction is the lowered quantity of knowledge learn/written from reminiscence with MQA.

RoPE Embeddings

Rotary Positional Embedding is a brand new sort of positional embedding that unifies absolute and relative approaches and provides superior outcomes. It incorporates the “relative” positions of two tokens moderately than absolute positions whereas calculating the Self Consideration. Transformers make use of self-attention or cross-attention mechanisms which might be agnostic to the order of tokens. This implies the mannequin perceives the enter tokens as a set moderately than a sequence. It thereby loses essential details about the relationships between tokens primarily based on their positions within the sequence. To mitigate this, positional encodings embed details about the token positions instantly into the mannequin.

One of these place embedding makes use of a rotation matrix to incorporate express relative place dependency within the self-attention formulation. Rotary embeddings are important for pure language processing as a result of they permit fashions to know higher the context through which phrases are used. When a mannequin has a greater thought of the place of the enter tokens, it will probably produce extra correct predictions. For instance, a language mannequin that makes use of RoPE may higher perceive that “I like pizza” and “Pizza is what I like” have totally different meanings as a consequence of phrase place. A mannequin could make extra nuanced predictions with a greater understanding of relative positioning.

No Biases

No biases have been utilized in dense and layer norms, which elevated coaching stability for giant fashions. This will increase the coaching effectivity and stability of LLM and permits them to scale back redundant parameters and improve area utilization and scaling.

Mannequin Variations

Palm offers many various variants of the mannequin of various sizes. They’ve named varied fashions primarily based on animal names and their sizes.

  • Gecko is the smallest and quickest mannequin that may work on edge units like cell even when offline.
  • Otter – Bigger than Gecko and may carry out complicated duties
  • Unicorn is extra important than otter and can be utilized for chat, textual content, and so on.
  • Bison is probably the most important and steady mannequin of Palm and is broadly used for textual content chat.

Mannequin Parameter Settings

The mannequin parameters assist us to switch and generate totally different responses for our immediate. Allow us to attempt to perceive them one after the other:

Temperature

This influences the randomness of the mannequin’s responses. A excessive temperature nearer to 1 leads to extra numerous output and artistic responses as an alternative of the dry set of definitions.  Suppose we need to perceive the that means of a specific phrase and its utilization on this case, we don’t require a inventive response however dictionary that means so we are able to maintain the temperature nearer to 0(deterministic responses). If we need to write an modern article or story, we are able to keep the temperature nearer to 1.

 Example of Temperature at value 0.5 | Podcast Accessibility | LLM Text Highlighting
Instance of Temperature at worth 0.5
 Example of Temperature at value 1 | Podcast Accessibility | LLM Text Highlighting
Instance of Temperature at worth 1

Token Restrict

A token refers to a bit of textual content and determines how a lot textual content a mannequin can course of. A bigger token restrict lets the mannequin achieve a broader scope of knowledge at a time, and a smaller restrict restricts the quantity of tokens it will probably deal with. Instance –  Palm 2 can now take 8,000 tokens concurrently as enter. 

 Comparison of token limit for different LLM models | Podcast Accessibility | LLM Text Highlighting
Comparability of token restrict for various LLM fashions

High – Okay

When producing textual content, the mannequin considers many doable phrases to comply with the present one. The highest-k sampling restricts which next-word selections to ok most definitely phrases. A decrease k-parameter worth makes the content material extra predictable, however the next quantity makes it extra diversified.

 Example showing how top-k and temperature parameter of LLM works- Source : https://michaelehab.medium.com/the-secrets-of-large-language-models-parameters-how-they-affect-the-quality-diversity-and-32eb8643e631
Instance exhibiting how top-k and temperature parameter of LLM works

High -p

It’s the chance threshold for contemplating phrases and controls the variety of output. The mannequin retains contemplating the subsequent phrase out of the highest ok selections till complete chance reaches the top-p worth. Which means moderately than specializing in the highest few most definitely phrases, the mannequin may settle for much less possible phrases in the event that they obtain the top-p chance collectively, leading to a extra diversified output. The next chance leads to a extra numerous mixture.

Most Outputs

This denotes the variety of outputs generated for a specific enter that’s, we are able to specify if we need to see multiple output of mannequin response and accordingly contemplate which one to take. Under within the picture, we are able to see the instance the place we get 2 responses for a similar enter after we set Max Output to 2.

 Example of LLM response when maximum output is set 2
Instance of LLM response when most output is about 2

Python Implementation of Podcast Textual content Highlighter

 Flowchart of Python Implementation | Podcast Accessibility | LLM Text Highlighting
Flowchart of Python Implementation Supply

1: Obtain Podcast Audio

We will obtain any podcast audio utilizing this hyperlink by pasting our podcast url. Right here, we use the Indian Categorical podcast url.

2: Load and Set up Libraries

!pip set up openai-whisper
import whisper

3: Transcribe Audio to Textual content

Initially, we used the “tiny” mannequin variant, after which we used the “base” variant, which is extra intensive and provides higher outcomes relating to the spelling of phrases and grammar. We transcribe two audio podcasts.

Be aware: After downloading the mp3 audio of the podcast from the hyperlink as talked about above, add it in your colab surroundings recordsdata and paste the trail of the audio file in transcribe perform as proven.

# Load whisper mannequin
whisper_model = whisper.load_model("base")

# Transcribe audio
def transcribe(file_path: str) -> str:
    # `fp16` defaults to `True`, which tells the mannequin to try to run on GPU.
    # we'll run this on the CPU for native demonstration functions by setting it to `False`.
    transcription = whisper_model.transcribe(file_path, fp16=False)
    return transcription['text']
  
transcript = transcribe('/content material/CATCH-UP-2023-Tenth-October-v1.mp3')
print(transcript)

Output

#OUTPUT
That is the catch-up on 3 issues for the Indian Categorical, and I'm Flora Swine. 
It is the Tenth of October, and listed here are the headlines. 4 days after the Hamas assault, the 
Israeli Military mentioned right now that they've regained management of the Gaza border. 
It warned the inhabitants to flee to neighboring Egypt in a grim 
reminder of the anticipated retaliation. The Israeli Military additionally 
reported the invention of the our bodies of 1500 Hamas militants inside Israeli territory
. The continuing battle has claimed roughly 1,600 lives, with 900 casualties in 
Israel and almost 700 in Gaza. In the meantime, Prime Minister Narendra Modi took to 
extradite and mentioned that he spoke with Israeli Prime Minister Benjamin Netanyahu, 
assuring him that India stands firmly with Israel and is troublesome to protect. He additionally 
mentioned that India strongly and unequivocally condemned terrorism in all its types and 
manifestations. Chief Justice of India, D.Y. Chandrachud, mentioned right now that the 
The Supreme Court docket's function is to not micromanage points that come up throughout the nation. He 
harassed that native issues are finest left to the jurisdiction of the respective Excessive 
Court docket. He was presiding over a three-judge bench. The CGI Maynthese remarks whereas 
listening to a matter associated to captive elephants and mentioned, Court docket, we've got to  
have a broader practical understanding as a courtroom. What's the function of the Supreme Court docket within the 
nation? To not take care of micromanagement of points that come up all around the nation. 
Two militants linked to the fear outfit Lashkare Thaibarvak have been killed in an encounter 
with safety forces within the Soviet district of Jaman Kashmir right now. The encounter broke 
out when the safety forces launched an anti-militancy operation within the Al-Sipura space, 
appearing on intelligence relating to the presence of militants. The illness militants 
have been recognized as Morifat Magbul and Jazim Farok. Chintanubhadhai was sentenced 
to life imprisonment right now for his involvement in abetting and conspiring to homicide 
his estranged spouse, Hema Obadhai, in 2015. The Classes Court docket additionally imposed life 
imprisonment sentences on three co-accused, specifically Vijay Rajvahar, Pradeep Rajvahar, 
and Shivkuma Rajvahar. On Saturday, the prosecution sought the loss of life penalty for 
all 4 people. The ICC Males's World Cup 2023 has two matches slated for right now. 
Pakistanis going through Shilankain Hagradwadwal Bangladesh is taking over England in Haramshalla. 
In different World Cup information, New Zealand beats the Dutch to win their second recreation in a row at
the competitors. The beforehand triumphed over defending Champions England within the 
match opener, inserting them on the prime of the factors desk. This was a catch-up on
three issues by the Indian Categorical.

4- Generate a Abstract utilizing Maker Suite

Now, we use this podcast abstract as coaching enter, put together its pattern mannequin response independently, and use the opposite as check enter. We go to this web site and generate a bullet abstract.

 Maker Suite to generate Bulleted Text Summary Source
Maker Suite to generate Bulleted Textual content Abstract Supply

We modify the mannequin parameter settings to generate summaries.

5: Generate Code utilizing Maker Suite

Generate the code utilizing the API Key of Palm API. We’ve generated our personal API key from the web site.

"""
On the command line, solely must run as soon as to put in the bundle through pip:

$ pip set up google-generativeai
"""

import google.generativeai as palm
palm.configure(api_key="API_KEY")

defaults = {
  'mannequin': 'fashions/text-bison-001',
  'temperature': 1,
  'candidate_count': 1,
  'top_k': 40,
  'top_p': 0.95,
  'max_output_tokens': 1024,
  'stop_sequences': [],
  'safety_settings': [{"category":"HARM_CATEGORY_DEROGATORY","threshold":4},{"category":"HARM_CATEGORY_TOXICITY","threshold":4},{"category":"HARM_CATEGORY_VIOLENCE","threshold":4},{"category":"HARM_CATEGORY_SEXUAL","threshold":4},{"category":"HARM_CATEGORY_MEDICAL","threshold":4},{"category":"HARM_CATEGORY_DANGEROUS","threshold":4}],
}
Sentence = "That is the atone for three issues for the Indian Categorical and I'm Flora Swain. It is the Tenth of October and listed here are the headlines. 4 days after the Hamas attacked the Israeli military mentioned right now that they've regained management of the Gaza border. It warned the inhabitants there to flee to neighboring Egypt whereas they will in a grim reminder of the retaliation that's anticipated to comply with. The Israeli military additionally reported the invention of the our bodies of 1500 Hamas militants inside Israeli territory. The continuing battle has claimed roughly 1600 lives with 900 casualties in Israel and almost 700 in Gaza. In the meantime, Prime Minister Narendra Modi took to X right now and mentioned that he spoke with Israeli Prime Minister Benjamin Netanyahu assuring him that India stands firmly with Israel and this troublesome artwork. He additionally mentioned that India strongly and unequivocally condemns terrorism in all its types and manifestations. Chief Justice of India D.Y. Chandrachud mentioned right now that the Supreme Court docket's function is to not micromanage points that come up throughout the nation. He harassed that native issues are finest left to the jurisdiction of the respective excessive courts. Prziding over a three-judge bench the CGI made these remarks whereas listening to a matter associated to captive elephants and mentioned, quote, we've got to as a courtroom have broader practical understanding. What's the function of the Supreme Court docket within the nation? To not take care of micromanagement of points which come up all around the nation. Unquote. Two militants linked to the fear outfit Lashkaretayabah have been killed in an encounter with safety forces within the Soapian District of Jammun Kashmir right now. The encounter broke out after safety forces launched an anti-militancy operation within the Alsepura space appearing on intelligence relating to the presence of militants. The deceased militants have been recognized as Mureffat Maghbul and Jasm Farukh. Chintanubhadi Haya was sentenced to life imprisonment right now for his involvement in a betting and conspiring to homicide his estranged spouse, Hema Upadhyay in 2015. The Classes Court docket additionally imposed life imprisonment sentences on three co-accused, specifically Vijay Rajpur, Pradeep Rajpur and Shivkumar Rajpur. On Saturday the prosecution have sought the loss of life penalty for all 4 people. The ICC males's World Cup 2023 has two matches slated for right now. Pakistan is going through Sri Lanka in Hyderabad whereas Bangladesh is taking over England in Haramshalla. In different World Cup information New Zealand beat the Dutch to win their second recreation in a row on the competitors. They beforehand triumphed over defending champions England within the match opener, inserting them on the prime of the factors desk. This was the Catchup on Three Issues by the Indian Categorical."
immediate = f"""Remodel a sentence right into a bulleted listing.
Sentence:  That is the atone for three issues for the Indian Categorical and I am Flora Swain. It is the eleventh of October and listed here are the headlines. Days after the Hamas assault, the Israeli army mentioned that it's finishing up strikes in Lebanon after an anti-tank guided missile was fired from the neighboring nation at one in every of its posts close to the blue line. As for reviews, there was a large buildup of troops alongside the Israel Gaza border because the nation ready for a floor invasion within the coming days. Greater than 2,000 individuals have misplaced their lives up to now within the warfare which began on Saturday. The Supreme Court docket right now took a severe exception to AIM's authorities in search of clarification of its order from the ninth of October which allowed the abortion of a 26-week-old fetus. The AIM's courtroom cited some recent issues and requested why the issues weren't conveyed to the courtroom earlier when it had sought a medical opinion on the ladies's request in search of permission for medical termination of being pregnant. A particular bench of justices, B.V. Nagaratma and Hema Kohli additionally pulled up the middle for approaching Chief Justice of India D.Y. Chandrachud's bench on Tuesday towards its order. Samajwadi celebration president Akhilesh Yadav was denied permission to go contained in the J.Okay.N.R. and Worldwide Middle to supply floral tribute to Freedom Fighter J.Okay.N.R. on his delivery anniversary. Officers cited safety causes for not permitting the previous UPCM into the middle right now. After he was denied permission, Akhilesh reads the constructing and jumped the middle's boundary wall together with different SP leaders and staff. Okay.H.N.R. ensued on the spot whereas the police tried to cease them from coming into the premises. The poster lady of Kerala's grownup literacy program, Okay.R.Y.H.A. Amma, died on the age of 101 at her home in Alapurha right now. In 2018, she made headlines by changing into the highest scorer within the state literacy mission's flagship grownup literacy program Akshana Laksham. On the age of 96, Okay.R.Y.H.A. scored 98 out of 100 marks within the examination that examined writing and mathematical abilities. CM Pinery Vijayan in his condolence message mentioned Okay.R.Y.A. was Kerala's satisfaction and a mannequin the person. Indian Afghanistan are battling one another within the ninth match of the ICC Cricket World Cup 2023 on the Orange JT Stadium in New Delhi right now. India added to your favourites for this match having convincingly gained their opening match towards Australia. However, Afghanistan misplaced their opening match to Bangladesh however they are going to be seeking to carry out higher towards India. This was the Catch Up on Three Issues by the Indian Categorical.
Bulleted: * Israeli army carried out strikes in Lebanon after an anti-tank guided missile was fired from the neighboring nation.
 * SC took a severe exception to AIIMS authorities in search of clarification of its order on abortion of a 26-week-old fetus.
* Akhilesh Yadav was denied permission to go contained in the J.Okay.N.R. and Worldwide Middle to supply floral tribute to Freedom Fighter J.Okay.N.R. * Poster lady of Kerala's grownup literacy program, Okay.R.Y.H.A. Amma, died on the age of 101.
* India Afghanistan are battling one another within the ninth match of the ICC Cricket World Cup 2023 on the Orange JT Stadium in New Delhi right now.
Sentence: {Sentence}
Bulleted:"""

response = palm.generate_text(
  **defaults,
  immediate=immediate
)
print(response.end result)

Last Output

Under is the ensuing output of our podcast. A lot of the content material is correct aside from spelling and names of correct nouns, reminiscent of Dharamshala and Lashkar-e-Taiba, and so on.

 LLM Output of Bulleted List . Source
LLM Output of Bulleted Listing
  • The Israeli military regained management of the Gaza border and warned the inhabitants to flee to neighboring Egypt.
  • PM Narendra Modi spoke with Israeli PM Benjamin Netanyahu. India strongly condemns terrorism.
  • CJI DY Chandrachud mentioned SC’s function is to not micromanage points. Native issues are finest left to HC.
  • 2 militants linked to Lashkarteayabah have been killed in an encounter with safety forces in J&Okay.
  • Chintanubhadi Haya was sentenced to life imprisonment for his involvement in betting and conspiring to homicide his estranged spouse, Hema Upadhyay in 2015.
  • ICC males’s World Cup 2023 has two matches slated for right now.
  • Pakistan faces Sri Lanka in Hyderabad, whereas Bangladesh is taking over England in Hharamshalla.
  • New Zealand beat the Dutch to win their second recreation within the competitors.

Conclusion

Highly effective instruments, LLMs can mix with different instruments to generate fast prototypes, enabling us to check and experiment with varied LLM use circumstances. Since LLM is a really new know-how, its potential use and implementation require lots of back-and-forth experiments. That is the place instruments like Maker Suite empower knowledge science and analytics professionals to shortly deliver their concepts into code utilizing minimal effort and time and specializing in fine-tuning and enhancing the info and different important components.

Key Takeaways

  • We realized concerning the fundamental idea of PALM 2 API and its options.
  • Additionally understood about varied mannequin parameter settings and learn how to optimize them for a specific desired immediate output
  • We noticed the totally different facets of the Google Maker Suite instrument and utilized it to generate our LLM code.
  • We used Whisper API and Palm API to generate bulleted summaries of related and correct podcasts.

Continuously Requested Questions

Q1.  Is PaLM API free to make use of?

A. Sure, the PALM API is open to the general public free of charge use however manufacturing isn’t free.

Q2. What are the totally different fashions out there in Maker Suite?

A. For now, Maker Suite solely permits one mannequin, Textual content-Bison.

Q3. Which is healthier, GPT-4 or PaLM API 2?

A. GPT-4 has round 1 trillion parameters in comparison with 540 B parameters of PaLM. Additionally, it helps multimodal options reminiscent of pictures as enter and output. So GPT-4 presents extra options and companies.

This autumn. Can we get responses in different languages?

A. PaLM helps responses in different languages however is accessible solely in a single mannequin, which isn’t open for public evaluation and is a paid service.

Q5. What are the Security Settings in PaLM API?

A. The protection settings in Palm API forestall any violent, derogatory, Medical, or Sexual content material within the mannequin responses. In our podcast abstract, we block violent content material, however as soon as we modify the settings and scale back the filter, we are able to get correct output.

The media proven on this article just isn’t owned by Analytics Vidhya and is used on the Creator’s discretion.

[ad_2]

Supply hyperlink

LEAVE A REPLY

Please enter your comment!
Please enter your name here