Jailer / Feature Requests / #55 PK check, quick mode

Ralf Wisser - 2018-11-15

Is there an index for the columns of the big table that have been configured as PK?
An index should be created for these columns to speed both the verification and the migration itself.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Ralf Wisser - 2018-11-15

While it is true that an appropriate index should be defined for each PK configuration, to determine whether a PK configuration would be correct, creating an index would be at least as expensive as running the test requests.

I'll think about what to do.

For the time being, I have changed the PK check so that the queries stop immediately if an error is detected. In addition, the queries are processed in multiple threads (as many as in "Parallel Threads" field specified)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Alexander Petrossian (PAF) - 2018-11-15
  
  I feel this is perfect. In my case it would have stopped very quicky = 9 hours test time saved ;)
  
  (not to speak of energy wasted and entropy increased)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Ralf Wisser - 2018-11-15

An idea:
Instead of searching for an appropriate primary key configuration for tables without a primary key, you could give such tables a primary key column.
Ideally one with autogenerated values. So for example:

alter table table_without_pk add id int identity(1,1) primary key not null

After a "Analyse Datebase", these columns would also automatically be recognized as a primary key.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Alexander Petrossian (PAF) - 2018-11-15
  
  Not an option for me, because DBA gave only read-only access to production data...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Ralf Wisser - 2018-11-15
    
    DBAs are the natural enemies of the Jailer ;)
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
    - Alexander Petrossian (PAF) - 2018-11-16
      
      ;)
      now I got a real treat: 1 day before vacation, almost all done (thanks to one very unusual Ralf person)
      and a 300Gb table with >billion records, many full duplicates.
      
      Just rows with all columns with same values.
      Go in pairs.
      And don't show first 200M records, start after that ;)
      
      I'm sure it's a mistake on the app side (and stupid non-unique key).
      
      But that's a no-go for beloved jailer.
      Any ideas on workaround? Quick hack?
      
      Last edit: Alexander Petrossian (PAF) 2018-11-16
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      
      Anonymous
      
      Add attachments
      Cancel
      You seem to have CSS turned off. Please don't fill out this field.
      
      You seem to have CSS turned off. Please don't fill out this field.
      - Alexander Petrossian (PAF) - 2018-11-16
        
        I see the code in upsert, and am considering hackin in a group by on all columns.
        But feel that may be a considerable slowdown.
        
        Please share your feelings about this. Maybe you've tried smth like that and know it would fail?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Alexander Petrossian (PAF) - 2018-11-16
        
        I guess, collect-duplicate was removed when we filled in JAILER_ENTITY,
        so the only real problem is export-duplicate at MERGE operation.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Ralf Wisser - 2018-11-16
        
        Are the duplicates in the source database?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Ralf Wisser - 2018-11-16
        
        The merge operations are only used if the target tables already contained data before the export. Otherwise, inserts will be executed.
        What result do you get if the target tables are initially empty?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Alexander Petrossian (PAF) - 2018-11-16
        
        I didn't try that yet. But may try today.
        As I understand, it would do 1 insert, because it would have only one relevant entry in ENTITIES.
        
        Great hint. Thanks, Ralf.
        
        Does it matter if other tables are filled? Is it enough I would truncate only problematic one?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Ralf Wisser - 2018-11-16
        
        No, if there are duplicate rows in the source table, it would also insert duplicates into the target table. (Would not that be the correct behavior?)
        
        As a dirty quick hack, you could try to do "insert into ... select distinct ...". To do so, apply the attached patch.
        
        SelectDistinct.patch
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Alexander Petrossian (PAF) - 2018-11-16
        
        Thanks, woud try that (though am having big worries about slowdown).
        
        Just to clarify my understanding: when we fill ENTITIES table we also do the duplicate check. May it be that that check would deduplicate input junk?
        
        Also I'm a bit at a loss about how junk managed to get into migrated table [it did! :(].
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Alexander Petrossian (PAF) - 2018-11-16
        
        UPD: I feel I got it.
        Duplicate check probably checks only previous day(s)?
        So on initial run, when target table was empty, inserts were indeed used and put those duplicates that I see there now (without erors)
        
        Then I do subsequent runs, and target table is nonempty, and jailer switched to merge and naturally I get errors.
        
        (Am still a bit worried about possible slowdown)
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Ralf Wisser - 2018-11-16
        
        I also suspect that the performance will suffer.
        
        Is this a one-time migration, or should it be done on a regular basis?
        
        If this is a one-time migration, would it be possible to clone the database and export data out of the clone? Then it would be possible to insert an identity column, which would solve just about any problem.
        
        Maybe you could also delete the duplicates after the export in the target table?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Alexander Petrossian (PAF) - 2018-11-16
        
        they plan to do it regularly.
        and I don't think they would risk changing the origin table.
        
        they give me a frozen copy from production to work on. but I don't know their plans on the data flow.
        One would be crazy to do this towards production servers,
        so hopefully there is a chance they would allow me adding an ID or cleanzing the junk beforehand.
        
        Thanks for the hints.
        I'll report if it would be too slow. Worried about temp tables nightmare...
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Ralf Wisser - 2018-11-16
        
        when we fill ENTITIES table we also do the duplicate check. May it be that that check would deduplicate input junk?
        
        Yes, ENTITIES contains keys without duplicates.
        But when the rows are exported, ENTITIES is joined to the source table.
        
        insert into target select ... from ENTITIES join source on ...
        
        So if the source table already contains duplicates, several rows are also exported per ENTITIES element.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Alexander Petrossian (PAF) - 2018-11-16
        
        got it!
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Alexander Petrossian (PAF) - 2018-11-16
        
        Alas, it failed before even getting to my table with a sad
        
        The text data type cannot be selected as DISTINCT because it is not comparable.
        
        I'll try to if it to only my one problematic table, just for test.
        
        My taxi to vacation comes in 5 hours, and soon will be a break for all ;)
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Alexander Petrossian (PAF) - 2018-11-28
        
        Ralf, (back now in the office now, am afraid am going to bug you some more)
        
        I found that distinct for all tables does not work for me, see above this strange "data type can not be ... DISTINCT" business.
        
        And anyway, I'm pretty sure it would slow things down considerably.
        
        Maybe you would consider adding some table-level option that would do the "distinct" magic for a certain manually-configured table?
        
        So far I saw only 1 table like this in whole my 4K+ tables project.
        
        (my idea is to switch to release and not give Customer anything my-hand-patched)
        
        Last edit: Alexander Petrossian (PAF) 2018-11-28
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Ralf Wisser - 2018-11-28
        
        How many rows does the problematic table contain?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Ralf Wisser - 2018-11-28
        
        And which version of the MSSQL do you use? (Maybe the pseudo column "rid" can be helpful)
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Ralf Wisser - 2018-11-28
        
        You could try the following:
        - define a column %%physloc%% as the primary key of the problematic table. Type is "varbinary", length 8.
        - define a filter that excludes this column from the export. (See screenshots attached).
        
        see https://stackoverflow.com/questions/909155/equivalent-of-oracles-rowid-in-sql-server
        
        Last edit: Ralf Wisser 2018-11-28
        
        Filter.png
        
        Table.png
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Alexander Petrossian (PAF) - 2018-11-28
        
        18K duplicates in 1.3B records.
        0.001% (cryes out loud)
        
        version:
        https://support.microsoft.com/en-us/help/4013104/cumulative-update-8-for-sql-server-2012-sp3
        
        one can not use it in where (invalid column name '%%physloc%%), so I feel that idea would not work. trying anyway...
        
        Last edit: Alexander Petrossian (PAF) 2018-11-28
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.

PK check, quick mode

Database Subsetter and Relational Data Browser

Group

Searches

Help

#55 PK check, quick mode

Related

Discussion