project operation

if we think of a relation as a table, the SELECT operation chooses some of the rows from the table while discarding other rows. the PROJECT operation, on the other hand, selects certain columns from the table and discards the other columns. if we are interested in only certain attributes of a relation, we use the PROJECT operation to project the relation over these attributes only. therefore, the result of the PROJECT operation can be visualized as a vertical partition of the relation into two relations:
one has the needed columns (attributes) and contains the result of the operation, and the other contains the discarded columns. for example, to list each employee's first and last name and salary, we can use the PROJECT operation as follows:
the general form of the project operation is
where is the symbol used to represent the project operation, and <attribute list> is the desired sublist of attributes from the attributes of relation . notice that is, in general, a relational algebra expression whose result is a relation, which in the simplest case is just the name of a database relation. the result of the project operation has only the attributes specified in <attribute list> in the same order as they appear in the list. hence, its degree is equal to the number of attributes in <attribute list>.
if the attribute list includes only nonkey attributes of , duplicate tuples are likely to occur. the PROJECT operation removes any duplicate tuples, so the result of the PROJECT operation is a set of distinct tuples, and hence a valid relation. this is known as duplicate elimination. for example, consider the following PROJECT operation:
the number of tuples in a relation resulting from a PROJECT operation is always less than or equal to the number of tuples in . if the projection list is a superkey of --that is, it includes some key of --the resulting relation has the same number of tuples as . moreover,
as long as <list2> contains the attributes in <list1>; otherwise; the left-hand side is an incorrect expression. it is also noteworthy that commutativity does not hold on PROJECT.
In sql, the PROJECT attribute list is specified in the SELECT clause of a query. for example, the following operation:
would correspond to the following sql query:
SELECT DISTINCT Sex, Salary FROM EMPLOYEE
notice that if we remove the keyword DISTINCT from this SQL query, then duplicates will not be eliminated. this option is not available in the formal relational algebra, but the algebra can be extended to include this operation and allow relations to be multisets; we do not discuss these extensions here.
[cite:@elmasri_db_2015 chapter 8.1.2 the PROJECT operation]