Thought & Theory

In Theory

Holy Performance! How to make python string concatenation fast

I hear that operator based string concatenation (”hello” + ” world”) is often the newbie’s approach to combining strings in Python but it’s not every day that I have to create huge strings from 35MB text files. Why am I creating huge strings? Well, one insert is better than 30,000!

Originally, my script executed one mysql INSERT per line which was approximately 30,000 INSERTs; it took about 10 minutes to run. One of the negative things about an ORM is that they make it terribly easy to write inefficient code. It’s quite handy to create one object per line of data and play with it, quickly build relations, and save it without having to write any ugly SQL but because the ORM essentially hides the fact that you’re working with a database, sometimes your pretty object-oriented code sends exhaustive messages to your database.

So I refactored my code, and now I have one massive INSERT statement which gets sent to the database. My early tests on small datasets looked good but when I tested my code with the full dataset, it never finished. After commenting out different sections of code, I found the nasty beast which was slowing me down: string concatenation.

Thanks to a survey of different methods for string concatenation I changed my code again and finally got my 10 minute script to run in 90 seconds.

According to the survey, the string method `join` is very fast and using it with a list comprehension makes the total operation even faster. Next time you’re combining strings, remember:

' '.join([a for a in ['a','join','will','make','it','faster!']])

Comments are closed.