Python String Handling
String is a sequence of characters. A character is simply a symbol. For
example, the English language has 26 characters. Computers do not deal with
characters, they deal with numbers (binary). Even though you may see characters
on your screen, internally it is stored and manipulated as a combination of 0's
and 1's. This conversion of character to a number is called encoding, and the
reverse process is decoding. ASCII and Unicode are some of the popular encoding
used.
In Python, string is a sequence of Unicode character. Unicode was
introduced to include every character in all languages and bring uniformity in
encoding.
Creating a String
Strings can be created by enclosing characters inside a single quote or
double quotes. Even triple quotes can be used in Python but generally used to
represent multiline strings and docstrings.
# all of the following are equivalent
my_string = 'Hello'
my_string = "Hello"
my_string = '''Hello'''
my_string = """Hello"""
# triple quotes string can extend multiple lines
my_string = """Hello, welcome to
the
exciting world
of string
in Python"""
Accessing Characters in a String
We can access individual characters using indexing and a range of
characters using slicing. Index starts from 0. Trying to access a character out
of index range will raise an
IndexError
.
The index must be an integer. We can't use float or other types, this will
result into TypeError
.
Python allows negative indexing for its sequences. The index of -1 refers
to the last item, -2 to the second last item and so on. We can access a range
of items in a string by using the slicing operator (colon).
>>> my_string = 'programiz'
>>> my_string[0] # 1st character
'p'
>>> my_string[-1] # last character
'z'
>>> my_string[15] # index must be in range
...
IndexError: string index out of range
>>> my_string[1.5] # index must be an integer
...
TypeError: string indices must be integers
>>> my_string[1:5] # slicing 2nd to 5th character
'rogr'
>>> my_string[5:-2] # slicing 6th to 7th character
'am'
Slicing can be best visualized by considering the index to be between the
elements as shown below. So if we want to access a range, we need the index
that will slice the portion from the string.
Changing or Deleting a String
Strings are immutable. This means that elements of a string cannot be
changed once it has been assigned. We can simply reassign different strings to
the same name.
>>> my_string = 'programiz'
>>> my_string[5] = 'a'
...
TypeError: 'str' object does not support item assignment
>>> my_string = 'Python'
>>> my_string
'Python'
We cannot delete or remove characters from a string. But deleting the
string entirely is possible using the keyword
del
.
>>> del my_string[1]
...
TypeError: 'str' object doesn't support item deletion
>>> del my_string
>>> my_string
...
NameError: name 'my_string' is not defined
Python String Operations
There are many operations that can be performed with string which makes
it one of the most used datatypes in Pyhon.
Concatenation
Joining of two or more strings into a single one is called concatenation.
The
+
operator
does this in Python. Simply writing two string literals together also
concatenates them. The *
operator can be used to repeat the
string for a given number of times. Finally, if we want to concatenate strings
in different lines, we can use parentheses.
>>> # using +
>>> 'Hello ' + 'World!'
'Hello World!'
>>> # two string literals together
>>> 'Hello ''World!'
'Hello World!'
>>> # using *
>>> 'Hello ' * 3
'Hello Hello Hello '
>>> # using parentheses
>>> s = ('Hello '
... 'World')
>>> s
'Hello World'
Iterating Through String
Using
for
loop we can iterate through a string.
Here is an example to count the number of 'l' in a string.
>>> count = 0
>>> for letter in 'Hello World':
... if(letter == 'l'):
... count += 1
...
>>> print(count,'letters found')
3 letters found
String Membership Test
We can test if a sub string exists within a string or not, using the
keyword
in
.
>>> 'a' in 'program'
True
>>> 'at' not in 'battle'
False
Built-in functions
Various built-in functions that work with sequence, works with string as
well. Some of the commonly used ones are
enumerate()
and len()
.
The enumerate()
function returns an enumerate object.
It contains the index and value of all the items in the string as pairs. This
can be useful for iteration. Similarly, len()
returns the length (number of
characters) of the string.
>>> list(enumerate('cold'))
[(0, 'c'), (1, 'o'), (2, 'l'), (3, 'd')]
>>> long_word = 'Pneumonoultramicroscopicsilicovolcanoconiosis'
>>> len(long_word)
45
Python String Formatting
Escape Sequence
If we want to print a text like -He said, "What's there?"- we
can neither use single quote or double quotes. This will result into
SyntaxError
as the text itself contains both
single and double quotes.
>>> print("He said, "What's there?"")
...
SyntaxError: invalid syntax
>>> print('He said, "What's there?"')
...
SyntaxError: invalid syntax
One way to get around this problem is to use triple quotes.
Alternatively, we can use escape sequences.
An escape sequence starts with a backslash and is interpreted
differently. If we use single quote to represent a string, all the single
quotes inside the string must be escaped. Similar is the case with double
quotes. Here is how it can be done to represent the above text.
>>> # using triple quotes
>>> print('''He said, "What's there?"''')
He said, "What's there?"
>>> # escaping single quotes
>>> print('He said, "What\'s there?"')
He said, "What's there?"
>>> # escaping double quotes
>>> print("He said, \"What's there?\"")
He said, "What's there?"
Here is a list of all the escape sequence supported by Python.
Here are some examples
>>> print("C:\\Python32\\Lib")
C:\Python32\Lib
>>> print("This is printed\nin two lines")
This is printed
in two lines
>>> print("This is \x48\x45\x58 representation")
This is HEX representation
Raw String
Sometimes we may wish to ignore the escape sequences inside a string. To
do this we can place
r
or R
in front of the string. This will
imply that it is a raw string and any escape sequence inside it will be
ignored.
>>> print("This is \x61 \ngood example")
This is a
good example
>>> print(r"This is \x61 \ngood example")
This is \x61 \ngood example
The format() Method
The
format()
method that is available with the
string object is very versatile and powerful in formatting strings. Format
strings contains curly braces {} as placeholders or replacement fields which
gets replaced. We can use positional arguments or keyword arguments to specify
the order.
>>> # default(implicit) order
>>> "{}, {} and {}".format('John','Bill','Sean')
'John, Bill and Sean'
>>> # order using positional argument
>>> "{1}, {0} and {2}".format('John','Bill','Sean')
'Bill, John and Sean'
>>> # order using keyword argument
>>> "{s}, {b} and {j}".format(j='John',b='Bill',s='Sean')
'Sean, Bill and John'
The
format()
method can have optional format
specifications. They are separated from field name using colon. For example, we
can left-justify <
,
right-justify >
or center ^
a string in the given space. We can
also format integers as binary, hexadecimal etc. and floats can be rounded or
displayed in the exponent format. There are a ton of formatting you can use.
Visit here for all the string formatting available with
the format()
method.
>>> # formatting integers
>>> "Binary representation of {0} is {0:b}".format(12)
'Binary representation of 12 is 1100'
>>> # formatting floats
>>> "Exponent representation: {0:e}".format(1566.345)
'Exponent representation: 1.566345e+03'
>>> # round off
>>> "One third is: {0:.3f}".format(1/3)
'One third is: 0.333'
>>> # string alignment
>>> "|{:<10}|{:^10}|{:>10}|".format('butter','bread','ham')
'|butter | bread | ham|'
Old style formatting
We can even format strings like the old
sprintf()
style used in C programming language.
We use the %
operator to accomplish this.
>>> x = 12.3456789
>>> print('The value of x is %3.2f' %x)
The value of x is 12.35
>>> print('The value of x is %3.4f' %x)
The value of x is 12.3457
String methods
The string
class in Python has numerous methods that provide diverse functionality. There
are 44 methods in the current version (3.2). All the methods are listed below
along with brief descriptions.
No comments:
Post a Comment