sum and merge duplicated data


hi  experts

in  my  code  I try merging  duplicated data and  sum  the  values  I know  the  loop   this  is  not  right  way  to  do  that  because   I have  a real  file  contains  about  at least 10000 rows  and increasable   so   I attach  a simple  file  to  understand  my  idea    I put  the  expected result from colJ:M

and  this  is  what  I have  but  unfortinatly   not  works  correctly

Sub RunMe()
Dim mFind As Range
Dim arrCols As Variant

       arrCols = Array(1, 2, 3)

Columns("A:c").Copy Columns("E:G")
Columns("E:G").RemoveDuplicates Columns:=(arrCols), Header:=xlYes

For Each cell In Range("e2:g" & Range("e1").End(xlDown).Row)
    Set mFind = Columns("A:c").Find(what:=cell.Value, lookat:=xlWhole)
    fAddress = mFind.Address
        cell.Offset(0, 1).Value = cell.Offset(0, 3).Value + mFind.Offset(0, 3).Value
        Set mFind = Columns("A:c").FindNext(mFind)
    Loop While mFind.Address <> fAddress
Next cell
End Sub




Selected Answer


(Note that I've removed my earlier Answer and replaced it with one using an array solution for the speed needed for several thousands records).

The revised file attached has a new sheet "Sort many" with 2,000 records in columns A:D, column A is sorted randomly bi ID  (apart from the first few records- used for test purposes*). There's a green shape "Do sort and sum" which is assigned to the macro below...

"SortSumValues" (code below, commented for your guidance) is in Module 1, together with a function ArrayRemoveDups to remove duplicates from an array. 

Click the green shape "Do sort and sum" and the results will quickly appear in columns G:K. (On my old machine, it took less than one second to sort count 2,000 records with 1,001 unique values).

The right-most column K shows how many records matched the ID (the quantities are added in column J).

Here's the main code:

Option Base 1

Sub SortSumGroupedValues()
'Uses  Option Base 1 for ease of handling arrays
Dim SortArray As Variant, SortArrayTemp As Variant, Temp As Variant
Dim m As Integer, n As Integer, p As Integer, q As Integer
Dim OutRng As Range

Const deLim = "||" ' combination  of characters (hopefuly rare in cells) used to separate values,

Range("G:K").Clear ' remove previous results

' copy data into array
SortArrayTemp = Range("A2:D" & Range("A" & Rows.Count).End(xlUp).Row).Value

' pack three columns into first cell, using delimiter
For m = LBound(SortArrayTemp, 1) To UBound(SortArrayTemp, 1)
    SortArrayTemp(m, 1) = SortArrayTemp(m, 1) & deLim & SortArrayTemp(m, 2) & deLim & SortArrayTemp(m, 3)
Next m

'call function to remove duplicates (via conversion to Collection)
SortArray = ArrayRemoveDups(SortArrayTemp) ' returns 1-D array

'bubble sort SortArray (= unique values from column 1)
For m = LBound(SortArray, 1) To (UBound(SortArray, 1) - 1)
    For n = (m + 1) To UBound(SortArray, 1)
        If SortArray(m) > SortArray(n) Then
            Temp = SortArray(m)
            SortArray(m) = SortArray(n)
            SortArray(n) = Temp
            Temp = ""
        End If
    Next n
Next m

'Expand SortArray to add a dimension (=make 4 columns for data + 1 for count)
    ReDim Temp(UBound(SortArray), 5)
    For m = LBound(SortArray) To UBound(SortArray)
        Temp(m, 1) = SortArray(m)
    Next m
    SortArray = Temp
    Temp = ""

' loop through data in temp array
For m = LBound(SortArrayTemp, 1) To UBound(SortArrayTemp, 1)
    'loop through data key on sorted array
    For n = LBound(SortArray, 1) To UBound(SortArray, 1) ' if data key found in sorted array
       'check if data key matches sorted key
       If SortArray(n, 1) = SortArrayTemp(m, 1) Then
          SortArray(n, 4) = SortArray(n, 4) + SortArrayTemp(m, 4) ' add values
          SortArray(n, 5) = SortArray(n, 5) + 1 'increase count
        End If
    Next n
Next m

' unpack first cell of each row to cells 3, 2 then 1 (removing delimiter)
For n = LBound(SortArray, 1) To UBound(SortArray, 1) '
    m = Len(deLim)
    p = InStr(SortArray(n, 1), deLim)
    q = InStrRev(SortArray(n, 1), deLim)
    SortArray(n, 3) = Right(SortArray(n, 1), Len(SortArray(n, 1)) - q - m + 1)
    SortArray(n, 2) = Mid(SortArray(n, 1), p + m, Len(SortArray(n, 1)) - q - m + 1)
    SortArray(n, 1) = Left(SortArray(n, 1), p - 1)
Next n

'copy headers
Range("G1:J1") = Range("A1:D1").Value 'copy headers
Range("k1") = "Records"
With Range("G1:K1")
    .Font.Bold = True
    .Interior.ColorIndex = 19
    .Borders.LineStyle = xlContinuous
End With

' display sorted and summed array
Set OutRng = Range("G2").Resize(UBound(SortArray, 1), 5)
With OutRng
    .Value = SortArray ' "paste" sorted array
    .Interior.ColorIndex = 35
    .Borders.LineStyle = xlContinuous
End With

MsgBox "Success- pasted " & UBound(SortArray, 1) & " records in G:K, adding quantities when columns 1 to 3 all match"

End Sub
Note that the general declaration "Option Base 1" MUST appear in the same module as this code (otherwise the array handling will mess up).

Revision 1: The code above (and attached file) is revised so that the records are summed only where columns 1 to 3 are the same (previously it assumed column A was unique and assumes B, C and D are the same for the same value of A). The parts in bold above show the key differences. Essentially I pack three columns into 1, using the pipe character | twice as a delimiter between the values. Once the sort and sum has been done on that combination, the macro  then unpacks them). This may be frowned on programmatically but it works!)

* The first 7 records have the same data in column A (so I could test the search loops) but columns B and C differ. ID AA-12-0 will show three sepaarte records (where B and C differ) with a total count of 7. You can easily check the quantity of that too.

Hope this is what you need.



It's  hard  work  to  do  that  and  take  more  time  from  you  
you did  unique  and  huge  work  Ireally   thank  you  for  this  a great  work 
you  give  me   a big  favor   well done !  I tested  your  code  in  my  real  file  contains about 10000 rows   it  gives  me  this  value  0.02  it's  very quick 
about  you  see  only  the  column  A  and the  data in columns B,C should  
be  are duplicted you're  right  when  I  ask  you   should  also  see B,C     I just  want  making  the  code  dynamically   sometimes   I  need  this  mod  in  the  futre  actually  i  found  difficult  to  understand  array  and  how  works   so  if  you  guide  me  how  mod  to  make   the  code  see COL A,B,C  Together  I tryly  appreciate  if  you  see  that is  complicated  I  satisfy   your  answering 
Kalil (rep: 4) Mar 19, '21 at 2:12 pm
Kalil. Thanks for selecting my Answer.

I'm afraid I can't understand your comments above (the lack on any punctuation doesn't help!) so I'm unable to respond.

Also I have no more time to spare on this.
John_Ru (rep: 1002) Mar 19, '21 at 2:21 pm

Please see Revision 1 to my Answer- this takes into account columns A B and C (summing quantiites only when all three match). I guess that's what you meant above.
John_Ru (rep: 1002) Mar 20, '21 at 9:27 am
Hi Kalil. Did that work for you?
John_Ru (rep: 1002) Mar 22, '21 at 8:22 am
Add to Discussion

I feel inclined to solve this problem without VBA.

  1. Add a helper column next to the existing Value column (D) with this formula =SUMIFS(D$2:D$7,A$2:A$7,A2)
    copied down.
  2. Change formulas in this column to values using Copy > Paste Special / Values
  3. Delete the original Value column.
  4. Delete duplicates using Data / Remove duplicates.

The SUMIFS formula can be extended to cover brands and types, if relevant. If you have to do this daily the task can be automated in VBA but applying the Excel tools described aboive, without the need for a loop.



thanks    but  as  I  said   my  data  are  a huge   and  increasable  so  I  see   it's  not  right  way  it  takes  from  me  more  time   if  I  drop  down    up  to  10000  rows  or  more 
Kalil (rep: 4) Mar 19, '21 at 12:25 am
Type A2:A10000 in the Name Bar, followed by Enter, and the specified range will be selected. You can use this method to copy a formula from one cell to 10,000 cells in under 5 seconds. The Name Bar is the field to the left of the Formula Bar at the top of your Excel screen where you normally see the address of the selected cell.
I strongly doubt that you can run a macro faster than implement my solution. I do agree that implementing a VBA version of my solution would be more convenient but it appears mandatory that no VBA version of it can ever come into existence while you reject the solution. So, consider it dead.
Variatus (rep: 4544) Mar 19, '21 at 6:59 am
Add to Discussion

Answer the Question

You must create an account to use the forum. Create an Account or Login