Functions I#

We’ll continue with the code from the Enumerate chapter. Splitting the sentences with split('.') was not perfect, so we’ll ask a LLM (Mistral) for an optimization:

Use python to split a string into sentences. Split at the dot. Include the dot into the new sentences. Remove empty strings.

mistralai/Mistral-Small-24B-Instruct-2501

To split a string into sentences in Python, including the dot in each sentence, you can use the split method along with a regular expression to ensure that the dot is included in each resulting sentence. Here’s how you can do it:

import re

def split_into_sentences(text):
    # Use regular expression to split by dot followed by a space or end of string
    sentences = re.split(r'(?<=\.)\s*', text)
    
    # Remove empty strings from the list of sentences
    sentences = [sentence for sentence in sentences if sentence]

    return sentences

# Example usage
text = "This is the first sentence. Here comes the second sentence. And here is the third one."
sentences = split_into_sentences(text)
for sentence in sentences:
    print(sentence)
This is the first sentence.
Here comes the second sentence.
And here is the third one.
['This is the first sentence.',
 'Here comes the second sentence.',
 'And here is the third one.']

In this code:

  • The re.split function is used to split the text.

  • The regular expression (?<=.)\s* ensures that the split occurs at a dot followed by any amount of whitespace (including zero whitespaces).

  • The split_into_sentences function returns a list of sentences, each including the dot.

This approach will handle cases where sentences are followed by spaces or no spaces after the dot.

Functions#

The generated code includes a function, introduced with the def keyword. A function is a block of code with a name (like a variable). A function is callable and will be executed only when it is called. Like a variable, a function needs to be defined before it can be called. (We’ll deal with functions in more detail later.)

def print_words(text):
    # split text into words 
    words = text.split(' ')
    for word in words:
        print(word)
    # print a newline character after the loop
    print('\n')

# now we can call the function 
print_words("hello world")
# we can call it multiple times
s = "Iterate and count, Loop through tasks with steady play, End when done repeating."
print_words(s)
hello
world


Iterate
and
count,
Loop
through
tasks
with
steady
play,
End
when
done
repeating.

The function requires some data as input, otherwise it will raise an error:

print_words()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 print_words()

TypeError: print_words() missing 1 required positional argument: 'text'

Improved code with the function#

from fpdf import FPDF 
import re

def split_into_sentences(text):
    # Use regular expression to split by dot followed by a space or end of string
    sentences = re.split(r'(?<=\.)\s*', text)
    
    # Remove empty strings from the list of sentences
    sentences = [sentence for sentence in sentences if sentence]

    return sentences


pdf = FPDF(format=(115, 180))
pdf.set_font('Helvetica')

# Store a longer text in the variable 'text'. 
text = "Hello, world, with your shades of green and blue, You're here to stay, the morning dew. The vast expanse, the creatures, trees, Each tiny grain of sand at the sea's breeze. A symphony of life, a dance so free, Hello, world, in you we breathe."
# Split text into sentences
sentences = split_into_sentences(text)

for index, sentence in enumerate(sentences):

    pdf.add_page()
    pdf.set_font_size(36)
    pdf.multi_cell(w=0, text=sentence, align='L')
    pdf.set_y(-24)
    pdf.set_font_size(10)
    pdf.cell(w=0, text=str(index+1), align='R')
    
pdf.output("hello_world.pdf")

enumerate gif