Implementing User Comments with SQLAlchemy - az-links.info
Disqus and Facebook are popular services that allow you to embed comments with your application, you may need to change the author field to be a If each comment has a relationship to its parent, then you can figure out. SQLAlchemy ORM Building Relationship - Learn SQLAlchemy in simple and Creating Session, Adding Objects, Using Query, Updating Objects, Applying. To add or edit your relationship status, first go to your profile.
Flask Web Development, 2nd Edition
To map this relationship on SQLAlchemy, we would create the following code: The official Relationships API documentation provides a complete explanation of these parameters and also covers other parameters not mentioned here.
The last type supported by SQLAlchemy, Many To Many, is used when instances of a particular class can have zero or more associations to instances of another class. For example, let's say that we are mapping the relationship of instances of Student and instances of Class in a system that manages a school. As many students can participate in many classes, we would map the relationship as follows: Note that, to make SQLAlchemy aware of the helper table, we passed it in the secondary parameter of the relationship function.
The above code snippets show just a subset of the mapping options supported by SQLAlchemy. In the following sections, we are going to take a more in-depth look into each one of the available relationship patterns.
Besides that, the official documentation is a great reference to learn more about relationship patterns on SQLAlchemy. These changes can be simple updates, which are called cascade updates, or full deletes, known as cascade deletes. Otherwise we will end up with a lot of garbage and unfulfilled references in our database. To make this kind of operation easy to maintain, SQLAlchemy ORM enables developers to map cascade behavior when using relationship constructs.
Indicates that when a parent object is deleted, children of this object will be deleted as well. Indicates that when a child object loses reference to a parent, it will get deleted. Indicates that merge operations propagate from parent to children. If more information about this feature is needed, the SQLAlchemy documentation provides an excellent chapter about Cascades.
As explained by Martin Fowler, a Unit of Work is used to maintain a list of objects affected by a business transaction and to coordinate the writing out of these changes. This means that all modifications tracked by Sessions Units of Works will be applied to the underlying database together, or none of them will.
In other words, Sessions are used to guarantee the database consistency. The official SQLAlchemy ORM documentation about Sessions gives a great explanation how changes are tracked, how to get sessions, and how to create ad-hoc sessions. However, in this article, we will use the most basic form of session creation: We need to create a session factory that is bound to the SQLAlchemy engine. After that, we can just issue calls to this session factory to get our sessions.
In the following sections, we will create a small project based on pipenv —a Python dependency manager—and add some classes to it. Then we will map these classes to tables persisted to a PostgreSQL database and learn how to query data. Starting the Tutorial Project To create our tutorial project, we have to have Python installed on our machine and pipenv installed as a global Python package.
The following commands will install pipenv and set up the project. These commands are dependent on Python, so be sure to have it installed before proceeding: There are many ways to get an instance of PostgreSQL. Another possibility is to install PostgreSQL locally on our current environment. The third option is probably the best choice because it has the performance of an instance running locally, it's free forever, and because it's easy to create and destroy Docker instances.
The only small disadvantage is that we need to install Docker locally. After having Docker installed, we can create and destroy dockerized PostgreSQL instances with the following commands: Defines the name of the Docker instance. Defines the password to connect to PostgreSQL. Defines the user to connect to PostgreSQL.
Defines the main and only database available in the PostgreSQL instance. Defines that the local port will tunnel connections to the same port in the Docker instance. To install these dependencies, we will use pipenv as shown: Note that to run the scripts that we are going to create, we first need to spawn the virtual environment shell.
That is, before executing python somescript. For example, the comment thread above would be given the following left, right and level values: If you sort the results by left, you get them in the correct threaded order, and then you can use level to determine the indentation to use when you render them on a web page. The big advantage of this method versus adjacency lists is that you can get the comments in the correct threaded order with a single database query, and even use pagination to get a subset of the thread.
Relationship Loading Techniques — SQLAlchemy Documentation
You may be thinking that this is actually a pretty clever solution that nicely addresses this problem, but have you considered what the algorithm looks like to assign these three numbers to each comment?
That is where the problem with this solution lies. Each time a new comment is added, a potentially large portion of the comments table will have to be updated with new left and right values. When you work with adjacency lists, the insertions are cheap and the queries are expensive. With nested sets it is the reverse, insertions are expensive and queries are cheap.
I have never implemented this algorithm myself, so I do not have example code readily available to show you how it looks, but if you want to see a real-world implementation, the django-mptt project is a great example that works with the Django ORM.
You can probably guess that queries are fairly simple from the examples above, but the logic required to insert a new comment is complex and highly inefficient, as a large number of comments might need to be updated depending on where in the tree the new comment is inserted. This solution only makes sense in cases where insertions are uncommon and queries are frequent. Thinking Outside the Box Unfortunately none of the solutions above worked well for my needs. I came up with a completely different approach that has both efficient inserts and queries, but in exchange has other, less severe limitations.
This solution adds a single column of type text, which I'm going to name path: So the first comment gets a 1, the second gets a 2, and so on. The contents of path for a top-level comment is the value of this counter. But for a reply, path is set to the path of the parent with the counter appended at the end. Using the same comment hierarchy from above examples, here are those comments in the random order they may have been entered, with their path values assigned: Now if I run a query on this table that sorts rows by path, I get the correct threaded order.
And to know what level of indentation each comment needs to have, I can look at how many components the path have. I just need to have a way to generate a unique and increasing value to assign to the new comment, which I can, for example, steal from the database assigned id.
I also need to know the parent of the comment, so that I can take its path and use it when creating the path of the child comment.
SQLAlchemy ORM Tutorial for Python Developers
Queries are also cheap. By adding an index on the path column, I can get the comments in the correct threaded order very efficiently, just by sorting by path.
And I can also paginate the list. So if this is all so great, what are the bad news? Take a look at the path assignments in the example above and see if you can spot the limitation. Did you find it? How many comments do you think this system supports?
In the way I constructed this example, you can't really have more than 10 comments or actually 9, unless you start counting from 0. Sorting by path only works when the numbers that are used in the path field have all the same number of digits, in this example just one.
Once a 10 appears, the sorting breaks, because I'm using strings, so 10 sorts between 1 and 2 and not after 9. So what's the solution? Let's allocate two digits for each component in the path: But of course this is still too limiting, so instead of two digits you will probably want to use more. If you use six digits, for example, you can have up to a million comments before you run into problems.
And if you find that you are approaching the limit with the number of digits that you used, you could take the comments offline for maintenance, regenerate the paths with more digits and then you would be okay again.
The implementation is actually not that bad.
SQLAlchemy 1.3 Documentation
I decided to combine this solution with the adjacency list option I presented above, as that gives me an easy and efficient way to obtain the parent given a comment I could do away with the adjacency list and extract the parent id from the path field, but that seems overly complicated. Subquery eager loading is detailed at Subquery Eager Loading. An introduction to raise loading is at Preventing unwanted lazy loads using raiseload. This is configured using the relationship.
For example, to configure a relationship to use joined eager loading when the parent object is queried: See Joined Eager Loading for background on this style of loading. The default value of the relationship. See Lazy Loading for further background. Very detailed control over relationship loading is available using loader options; the most common are joinedloadsubqueryloadselectinload and lazyload. The option accepts either the string name of an attribute against a parent, or for greater specificity can accommodate a class-bound attribute directly: This means that when a collection or association is lazily loaded upon access, the specified option will then take effect: When the children collection on a particular Parent object is first accessed, it will lazy load the related objects, but additionally apply eager loading to the subelements collection on each member of children.
Using method chaining, the loader style of each link in the path is explicitly stated. For example, given the previous example: This stays the case even if the above Parent object is accessed from a subsequent query that specifies a different set of options.
To change the options on an existing object without expunging it and re-loading, they must be set explicitly in conjunction with the Query. A future SQLAlchemy release may add more alternatives to manipulating the loader options on already-loaded objects. The scalar or collection attribute associated with a relationship contains a trigger which fires the first time the attribute is accessed.
This trigger typically issues a SQL call at the point of access in order to load the related object or objects: For this reason, while lazy loading can be expensive for related collections, in the case that one is loading lots of objects with simple many-to-ones against a relatively small set of possible target objects, lazy loading may be able to refer to these objects locally without emitting as many SELECT statements as there are parent objects. Lazy loading can be enabled for a given attribute that is normally configured in some other way using the lazyload loader option: However, eager loading requires that the attributes which are to be loaded be specified with the Query up front.
The problem of code that may access other attributes that were not eagerly loaded, where lazy loading is not desired, may be addressed using the raiseload strategy; this loader strategy replaces the behavior of lazy loading with an informative error being raised: For example, to set up only one attribute as eager loading, and all the rest as raise: To set up raiseload for only the Order objects, specify a full path with orm.
At the mapping level, this looks like: This is achieved using the joinedload loader option: For an attribute that is guaranteed to have an element, such as a many-to-one reference to a related object where the referencing foreign key is NOT NULL, the query can be made more efficient by using an inner join; this is available at the mapping level via the relationship. Older versions of SQLAlchemy would convert right-nested joins into subuqeries in all cases.
Such as above, if the User object we loaded referred to three Address objects, the result of the SQL statement would have had three rows; yet the Query returns only one User object.
As additional rows are received for a User object just loaded in a previous row, the additional columns that refer to new Address objects are directed into additional results within the User.