postgresql-9.3 , SQL to remove duplicated rows

SQL to remove duplicated rows


Tag: postgresql-9.3

I've written a sql statement to only keep one instance (minimum id) where there are duplicated product_codes. The issue is that the statement is very inefficient and takes absolutely ages to run, so I'm hoping there is a more efficient way to write it

The dataset is structured as:

id  product_code  cat_desc      product_desc  
1   2352345       423           COCA COLA   
2   8967896       457           FANTA   
3   6456466       435           SPARKLING WATER 
4   3562314       457           STILL WATER 

The statement is:

FROM raw_products_inter
             FROM raw_products_inter outer_table
             WHERE product_code IN (SELECT product_code
                                    FROM raw_products_inter
                                    GROUP BY 1
                                    HAVING COUNT(id) > 1)
             AND   id NOT IN (SELECT MIN(id)
                              FROM raw_products_inter inner_table
                              WHERE inner_table.product_code = outer_table.product_code))


You should be able to boost the performance using an EXISTS condition:

  FROM raw_products_inter P
          SELECT *
            FROM raw_products_inter OP
           WHERE OP.product_code = P.product_code
             AND <


Why are both SELECT count(PK) and SELECT count(*) so slow?

I've got a simple table with single column PRIMARY KEY called id, type serial. There is exactly 100,000,000 rows in there. Table takes 48GB, PK index ca 2,1GB. Machine running on is "dedicated" only for Postgres and it is something like Core i5, 500GB HDD, 8GB RAM. Pg config was...

Rails, Postgres, ActiveRecord query postgres based on columns value

I working with Rails and Postgresql and I'm trying to query my postgres db based on the value of a column. To put it into perspective, I have an Events table in postgres and in that table I have some events that are recurring and some that aren't based on...

PostgreSQL - Returning the results of multiple arbitrary sub-queries

Like the title of the question suggests, I'm attempting take a number of arbitrary sub-queries and combine them into a single, large query. Ideally, I'd like to the data to be returned as a single record, with each column being the result of one of the sub-queries. E.G. | sub-query...

Postgres not logging all queries, despite logging the duration

I am trying to get my postgresql 9.3 server to log all sql that runs longer than 1 second. I have set: log_min_duration=1s log_statement='mod' log_duration=off for most queries, the logging is working correclty, but some statements, such as "CREATE TABLE AS" or "INSERT" are not logging the statement. The log...

Activerecord Join not returning all expected results

I've got a pre-existing postgres database that I'm attempting to query. I've set up two models, Customer and Equipment. I'm trying to query two tables joining them with a non-standard key. No matter what I've tried, I get back only the result from one table. I've tried changing the ActiveRecord...

Backslash works incorrectly in LIKE clause

I'd to use LIKE and backslash to search some names. The problem is Postgres understands backslash as escape character in LIKE clause. I tried to turn on standard_conforming_strings but it doesn't help. SELECT h.software_id ,h.software_name FROM software h WHERE software_name LIKE '%\%'; This query doesn't show anything whereas I have...

How to specify timestamp along with date in postgresql

I would like to specify a static timestamp along with dynamic date in postgresql I am using now()-1 to get date. I am not sure how to specify static timestamp The format should be 2015-06-12 20:45:00:00 Now I am using select now()-1 from dual to get previous date....


I am trying to optimize my bulk loading routine. Currently I load data in steps (I am not following SQL syntax below, just the algorithm): BEGIN TRUNCATE table COPY into table ANALYZE table COMMIT Before PostgreSQL 9.3 this was the only recommended way to re-load a table. Version 9.3 introduces...

dependant: :destroy leading to postgresql error

class Accdist < ActiveRecord::Base has_many :accdistlavoraziones, dependent: :destroy When deleting an accdist, the following is being output by the console: Accdist Load (27.3ms) SELECT "accdists".* FROM "accdists" WHERE "accdists"."id" = $1 LIMIT 1 [["id", "1"]] Accdistlavorazione Load (20.2ms) SELECT "accdistlavoraziones".* FROM "accdistlavoraziones" WHERE "accdistlavoraziones"."accdist_id" = 1 SQL (59.7ms) DELETE FROM...

PostgreSQL 9.3: Split one column into multiple

I want to split one column that is colb in the given below example into two columns like column1 and column2. I have a table with two columns: Example: create table t3 ( cola varchar, colb varchar ); Insertion: insert into t3 values('D1','2021to123'), ('D2','112to24201'), ('D3','51to201'); I want to split the...

Query to find second largest value from every group

I have three tables: project: project_id, project_name milestone: milestone_id, milestone_name project_milestone: id, project_id, milestone_id, completed_date I want to get the second highest completed_date and milestone_id from project_milestone grouped by project_id. That is I want to get the milestone_id of second highest completed_date for each project. What would be the correct...

Syntax error passing SQL result to PostgreSQL function accepting array

I tried to pass the result of a SQL query to a function, but I got a syntax error. contacts=> SELECT count(*) FROM update_name(contact_ids := select array(select id from contact where name is NULL)); ERROR: syntax error at or near "select" LINE 1: SELECT count(*) FROM update_name(contact_ids := select array......

PostgreSQL 9.3: missing FROM-clause entry for table

I have a table with two columns. Example: create table t1 ( cola varchar, colb varchar ); Now I want to insert the rows from function. In the function: I want to use two parameters which is of type varchar to insert the values into the above table. I am...

PSQL Error Level in Batch For Loop

I am attempting to run a postgres query from within a batch file. However, I have thus far been unable to detect when the command fails. The following is what I have tried thus far: @FOR /F %%A IN ('PSQL -U userName -d dbName -t -c "SELECT * FROM nonExistantTable"...

Populate NULL value with most recent value from the same column

I am trying to populate NULL values in a column with the most recent non-NULL value in that column. For instance in the example below I want the IG column for the FR and first SPR values to be '1', but the final SPR value to be '0'. As I...

Subquery is faster using a function

I have a long query (~200 lines) that I have embedded in a function: CREATE FUNCTION spot_rate(base_currency character(3), contra_currency character(3), pricing_date date) RETURNS numeric(20,8) Whether I run the query directly or the function I get similar results and similar performance. So far so good. Now I have another long query...

Subtract the value of a row from grouped result

I have a table supplier_account which has five coloumns supplier_account_id(pk),supplier_id(fk),voucher_no,debit and credit. I want to get the sum of debit grouped by supplier_id and then subtract the value of credit of the rows in which voucher_no is not null. So for each subsequent rows the value of sum of debit...

Partition pruning based on check constraint not working as expected

Why is the table "events_201504" included in the query plan below? Based on my query and the check constraint on that table I would expect the query planner to be able to prune it entirely: database=# \d events_201504 Table "public.events_201504" Column | Type | Modifiers ---------------+-----------------------------+--------------------------------------------------------------- id | bigint |...

Psycopg ppygis select query

I'm trying to setup a basic working postgis setup with python ppygis package. >>> import psycopg2 >>> import ppygis >>> connection = psycopg2.connect(database='spre', user='postgres') >>> cursor = connection.cursor() >>> cursor.execute('CREATE TABLE test (geometry GEOMETRY)') >>> cursor.execute('INSERT INTO test VALUES(%s)', (ppygis.Point(1.0, 2.0),)) >>> cursor.execute('SELECT * from test') >>> point = cursor.fetchone()[0]...

PostgreSQL 9.3: STUFF and CHARINDEX function

I want to retrieve some part of given string. Here is the following example for the string: Example: In SQL Server Declare @Names varchar = 'H1,H2,H3,' SELECT STUFF(@Names,1,CHARINDEX(',',@Names,0),''); After referring this : 'stuff' and 'for xml path('')' from SQL Server in Postgresql. String_agg can't help me for this scenario....

How can I store a variable in a postgresql script?

I have the following script where I need to find a given chapter, change the state, then store the activity reference to remove the activity later (because of the FK in chapter_published activity), delete the chapter_published reference and then use the id_activity to finally remove the parent activity. How would...

PostgreSQL query failure

I have a unique problem with PostgreSQL. After inserting data into a database I try and retrieve everything greater than a specific string. However, it does not return any data. So, I tried this on another machine and it worked. So my problem is that my data is returned on...

Call aliased column result of aggregate function JOOQ

I'm currently trying to retrieve a single double value from this query in JOOQ Query Builder and PostgreSQL as the database, providing that DRINKS.PRICE is of type double and ORDER_DRINK.QTY is of type integer. Record rec ="am_due")).from(ORDERS .join(ORDER_DRINK .join(DRINKS) .on(DRINKS.DRINK_KEY.equal(ORDER_DRINK.DRINK_KEY))) .on(ORDERS.ORDKEY.equal(ORDER_DRINK.ORDER_KEY))) .where(ORDERS.TOKEN.eq(userToken)) .fetchOne(); As I've understood from the (brief)...

Update with row_number() not working, why?

I have the following table: CREATE TABLE t_overview ( obj_uid uuid, obj_parent_uid uuid, obj_no integer, obj_text text, obj_path text, isdir integer, intid bigint, intparentid bigint ) I want to move from uuid to bigint and created the new columns intid and intparentid. I need a unique integer (obj_uid is the...

postgres psql error trying to pass parameters in sql script

In postgresql, I'm psql with the -v for variable input that I can call within a sql file. For example from bash script, it looks like this: "$PSQL_HOME"/psql -h $HOST_NM \ -p $PORT \ -U postgres \ -v v1=$1 \ -f Test.sql ... .. From the sql file, it looks...

load with order when using includes clause in ruby

I need to do the following: I have a huge list of IDs (called user_ids). I would like to pull all the users where :id => user_ids, and include the photos model as well. However, I would like the photos model to be sorted by created_at (because I need to...

What is the Maximum Size of PosgreSQL Child Table

What is the maximum size of a PosgreSQL Child Table? I saw a limit of 32TB here, but it does not specify in regards to child tables....

Compare result of two table functions using one column from each

According the instructions here I have created two functions that use EXECUTE FORMAT and return the same table of (int,smallint). Sample definitions: CREATE OR REPLACE FUNCTION function1(IN _tbl regclass, IN _tbl2 regclass, IN field1 integer) RETURNS TABLE(id integer, dist smallint) CREATE OR REPLACE FUNCTION function2(IN _tbl regclass, IN _tbl2 regclass,...

Why does the Inner Join query not work while multi where clause does

Lets say I run a crooked car company. Let's say I have the following table: car_engine_mileage_counters which is a join table from car_engines onto mileage_counters also storing a calculated field of mileage Lets also say that I encode a coefficient at the engine block level in my factory on an...

Postgres Join Query is SOMETIMES taking the cartesian product

I'm attempting to join multiple tables for one query and I am getting inconsistent results from the database, I believe my query is taking the cartesian product of all the users, when I only want users who are in the DirectConversation. The Schema for reference: The query is (where $id...

Postgresql select min Date value

These are my tables: table: tickets ticketid: serial userid: integer dateticket: date timeticket: time table: users userid: serial username: varchar password: varchar These are my data: userid username password 1 user1 123 2 user2 123 ticketid userid dateticket timeticket 1 1 2015-05-27 14:47:14 2 1 2015-05-27 14:47:15 3 1 2015-05-27...

PostgreSQL 9.3: Pivot table query

I want to show the pivot table(crosstab) for the given below table. Table: Employee CREATE TABLE Employee ( Employee_Number varchar(10), Employee_Role varchar(50), Group_Name varchar(10) ); Insertion: INSERT INTO Employee VALUES('EMP101','C# Developer','Group_1'), ('EMP102','ASP Developer','Group_1'), ('EMP103','SQL Developer','Group_2'), ('EMP104','PLSQL Developer','Group_2'), ('EMP101','Java Developer',''), ('EMP102','Web Developer',''); Now I want to show the pivot table for...

sequelize with postgres database not working after migration from mysql

I change MySQL databese into postgreSQL in sequelize. But After migration I have issue with upper and lowercase first letter in Table or Model... Before my MySQL version was working properly but after migration I got error message: 500 SequelizeDatabaseError: relation "Users" does not exist My User model: module.exports =...

Speed up Min/Max operation on postgres with index for IN operator query

I would like to optimize the following query in postgres SELECT(MIN("products"."shipping") AS minimal FROM "products" WHERE "products"."tag_id" IN (?) with an index like CREATE INDEX my_index ON products (tag_id, shipping DESC); Unfortunately this one is only used when it's just one tag. Almost alwayst it is queried for a handful...

Errors importing data using COPY comand at postgresql 9.3.5

I am trying import a database table to Postgres 9.3.5 database server using COPY command as follows: COPY comment (generatedid, id, "timestamp", message, bugreport_id, personcontainer_id) FROM stdin; 1 12840538 2010-03-03 09:50:46 How is that an error in HttpClient? Don&#39;t buffer large content in memory, or configure memory in your VM...

How to Mimic Postgres Foreign Keys into a Partitioned Table

I have a partitioned table (call it A) with a serial primary key that is referenced by another table (call it B). I know that I can't actually create a foreign key from one to the other (since I don't know from which partition the data is actually stored), so...

Alternatives to WITH .. AS .. clause in PostgreSQL

I have several big queries of the following type (simplified for clarity). create function myfunction() returns void as $$ begin ... with t as ( total as total, total * 100 / total as total_percent, total / people.count as total_per_person, part1 as part1, part1 * 100 / total as part1_percent,...

python - ImportError - help configure environment to use with postgresql module

Python newbie here. I'm trying to figure out how to connect to a postgresql database using python.I need help setting up this environment (Guest VM with Fedora 12). I have postgresql 9.3 and wanted to use Psycopg2 which I think comes with postgres? I'm getting the following message: $ python...

Unable to connect to Postgres via PHP but can connect from command line and PgAdmin on different machine

I've had a quick search around (about 30 minutes) and tried a few bits, but nothing seems to work. Also please note I'm no Linux expert (I can do most basic stuff, simple installs, configurations etc) so some of the config I have may be obviously wrong, but I just...

PostgreSQL 9.3: Generate months name list

I want to generate months names list using PostgreSQL 9.3. For example: Months --------- January February March April .. .. December ...

Why does adding a JOIN completely modify the query planner behaviour?

I have two queries: SELECT "recipes_recipe"."short_name", COUNT(DISTINCT "recipes_recipe"."quantity_type") AS "quantity_type_count", SUM("measures_measure"."standard") AS "volume", CASE WHEN COUNT(DISTINCT "recipes_recipe"."quantity_type") = 1 THEN (SUM((T7."standard" * T8."standard")) / SUM(T8."standard")) ELSE NULL END AS "weighted_temperature" FROM "orders_orderitemresult" INNER JOIN "orders_orderitem" ON ( "orders_orderitemresult"."order_line_id" = "orders_orderitem"."id" ) INNER JOIN "orders_order" ON ( "orders_orderitem"."order_id" =...

PostgreSQL 9.3: Is it possible to connect with localhost with postgres_fdw?

The idea is that I have local database named northwind, and with postgres_fdw I want to connect with another database named test on localhost (remote connection simulation, for situations like when table in my database is updated, do something in other database like save to history etc..). So I opened...

FOR loop on PLpgSQL function result

I wrote a PLpgSQL function which should return SETOF products table: CREATE OR REPLACE FUNCTION get_products_by_category (selected_category_id smallint DEFAULT 1) RETURNS SETOF products AS $BODY $BEGIN RETURN QUERY (SELECT * FROM products WHERE CategoryID = selected_category_id); END; $BODY$ LANGUAGE plpgsql VOLATILE NOT LEAKPROOF COST 100 ROWS 1000; And next I...

Update table in a complex function using exceptions

I'm little lost trying to solve a problem. At first I've this 5 tables: CREATE TABLE DOCTOR ( Doc_Number INTEGER, Name VARCHAR(50) NOT NULL, Specialty VARCHAR(50) NOT NULL, Address VARCHAR(50) NOT NULL, City VARCHAR(30) NOT NULL, Phone VARCHAR(10) NOT NULL, Salary DECIMAL(8,2) NOT NULL, DNI VARCHAR(10) UNIQUE, CONSTRAINT pk_Doctor PRIMARY...

PostgreSQL function execute query

I want to run a SQL query if a condition is met, but I get the following error: ERROR: a separate $ chain is unfinished in or near «$func$ my SQL query is: CREATE OR REPLACE FUNCTION myfunc() RETURNS TABLE(dateticket date, timeticket time, userid integer, my_all bigint) AS $func$ BEGIN...

SQL to remove duplicated rows

I've written a sql statement to only keep one instance (minimum id) where there are duplicated product_codes. The issue is that the statement is very inefficient and takes absolutely ages to run, so I'm hoping there is a more efficient way to write it The dataset is structured as: id...

What's the correct way to do IN (date-range) in Postgres?

What's the correct way to do this in Postgres? delete from days where date IN ("2014-02-15", "2014-02-07", "2014-02-08", "2014-02-09", "2014-03-01"); ERROR -- : PG::UndefinedColumn: ERROR: column "2014-02-15" does not exist works fine in MySQL and Sqlite3...

Error with single quotes inside text in select statement

Getting the error using Postgresql 9.3: select 'hjhjjjhjh'mnmnmnm'mn' Error: ERRO:syntax error in or next to "'mn'" SQL state: 42601 Character: 26 I tried replace single quote inside text with: select REGEXP_REPLACE('hjhjjjhjh'mnmnmnm'mn', '\\''+', '''', 'g') and select '$$hjhjjjhjh'mnmnmnm'mn$$' but it did not work. Below is the real code: CREATE OR REPLACE...

Postgres SQL, how to automatically increment ID when duplicate / insert between two sequential ID's?

I have a table with a SERIAL ID as primary key. As you know the serial id increments itself automatically, and I need this feature in my table. ID | info --------- 1 | xxx 2 | xxx 3 | xxx For ordering matters, I want to insert a row...

Error while concatenating plpgsql var with query on cursor statement

I am getting error trying concatenate the var sch in the second For: ERROR: syntax error in or next a "||" SQL state: 42601 Character: 1151 Does anyone know how to solve this problem concatenation? CREATE OR REPLACE FUNCTION generate_mallet_input2() RETURNS VOID AS $$ DECLARE sch name; r record; BEGIN...