Functions I#
We’ll continue with the code from the Enumerate
chapter. Splitting the sentences with split('.')
was not perfect, so we’ll ask a LLM (Mistral) for an optimization:
Use python to split a string into sentences. Split at the dot. Include the dot into the new sentences. Remove empty strings.
mistralai/Mistral-Small-24B-Instruct-2501
To split a string into sentences in Python, including the dot in each sentence, you can use the split method along with a regular expression to ensure that the dot is included in each resulting sentence. Here’s how you can do it:
import re
def split_into_sentences(text):
# Use regular expression to split by dot followed by a space or end of string
sentences = re.split(r'(?<=\.)\s*', text)
# Remove empty strings from the list of sentences
sentences = [sentence for sentence in sentences if sentence]
return sentences
# Example usage
text = "This is the first sentence. Here comes the second sentence. And here is the third one."
sentences = split_into_sentences(text)
for sentence in sentences:
print(sentence)
This is the first sentence.
Here comes the second sentence.
And here is the third one.
['This is the first sentence.',
'Here comes the second sentence.',
'And here is the third one.']
In this code:
The re.split function is used to split the text.
The regular expression (?<=.)\s* ensures that the split occurs at a dot followed by any amount of whitespace (including zero whitespaces).
The split_into_sentences function returns a list of sentences, each including the dot.
This approach will handle cases where sentences are followed by spaces or no spaces after the dot.
Functions#
The generated code includes a function
, introduced with the def
keyword. A function is a block of code with a name (like a variable). A function is callable and will be executed only when it is called. Like a variable, a function needs to be defined before it can be called. (We’ll deal with functions in more detail later.)
def print_words(text):
# split text into words
words = text.split(' ')
for word in words:
print(word)
# print a newline character after the loop
print('\n')
# now we can call the function
print_words("hello world")
# we can call it multiple times
s = "Iterate and count, Loop through tasks with steady play, End when done repeating."
print_words(s)
hello
world
Iterate
and
count,
Loop
through
tasks
with
steady
play,
End
when
done
repeating.
The function requires some data as input, otherwise it will raise an error:
print_words()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 print_words()
TypeError: print_words() missing 1 required positional argument: 'text'
Improved code with the function#
from fpdf import FPDF
import re
def split_into_sentences(text):
# Use regular expression to split by dot followed by a space or end of string
sentences = re.split(r'(?<=\.)\s*', text)
# Remove empty strings from the list of sentences
sentences = [sentence for sentence in sentences if sentence]
return sentences
pdf = FPDF(format=(115, 180))
pdf.set_font('Helvetica')
# Store a longer text in the variable 'text'.
text = "Hello, world, with your shades of green and blue, You're here to stay, the morning dew. The vast expanse, the creatures, trees, Each tiny grain of sand at the sea's breeze. A symphony of life, a dance so free, Hello, world, in you we breathe."
# Split text into sentences
sentences = split_into_sentences(text)
for index, sentence in enumerate(sentences):
pdf.add_page()
pdf.set_font_size(36)
pdf.multi_cell(w=0, text=sentence, align='L')
pdf.set_y(-24)
pdf.set_font_size(10)
pdf.cell(w=0, text=str(index+1), align='R')
pdf.output("hello_world.pdf")