Add a new column based on subsets with python

Category: python dataframe subset (4 Views)

I have a data file like this:

      id       numbers 
       1       10, 20, 40, 50   
       2       20, 30              
       3       20          
       4       40, 20            
       5       20, 40, 10
       6       100, 110, 120
       7       100, 110
       8       50, 40
       9       70
       10      70

I try to find superset and subset of this dataset. For e.g.

  • '20', '40, 20', '50, 40', '20, 40, 10' are subsets of '10, 20, 40, 50',
  • '20' is a subset of '20, 30',
  • '100, 110' is a subset of '100, 110, 120',
  • '70' is a subset of '70'.

and

  • '10, 20, 40, 50', '20, 30', '100, 110, 120' and '70' are supersets.

I would like to show and store that relation in the separate excel file like this:

      id       number              subsets(id)
       1       10, 20, 40, 50        3, 4, 5, 8
       2       20, 30                3
       6       100, 110, 120         7
       9       70                    10

Subsets will remove from excel file and id's of subsets will store for each superset as a new column.

🔴 No definitive solution yet