HPE Storage Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 5 posts ] 
Author Message
 Post subject: Failed node on 8200
PostPosted: Mon Nov 02, 2020 4:18 am 

Joined: Mon Mar 04, 2019 8:57 am
Posts: 6
Hi All,

Maybe someone could help me out reasoning this out. We have a node failure on one of our 8200's. The node failed more than a week ago (still waiting for a replacement from HPE :( ).

While the node is failed noticing that :

InSplore and graphs cannot be generated.

Also noticing space issues for instance we have allocated a 13TB space from storage (dedup R6 volume) to our VMware cluster (VMFS 6 so unmap should run automatically). From 13Tb we are consuming around 7TB.

From storage side we are seeing capacity allocated increasing with compaction ratio going down and dedup back to 1:1.
Free space on this storage is now at 3% constantly decreasing.

HPE are saying that this is normal growth however from our end from Vmware side we are still seeing just 7Tb being utilised. So how come storage space on the 3PAR is increasing constantly?


--Estimated(MB)---
RawFree UsableFree
374400 249600
MTB-3PAR-M cli%

Just wondering if anyone has experienced a behaviour similar to ours and more importantly if we are risking that the storage goes out of space.

Thank you


Attachments:
3par.PNG
3par.PNG [ 17.48 KiB | Viewed 11233 times ]
Top
 Profile  
Reply with quote  
 Post subject: Re: Failed node on 8200
PostPosted: Mon Nov 02, 2020 8:10 am 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 1570
Location: Europe
I think HPE is forgetting something very important.

Garbage Collection for dedupe is running on all nodes in the cluster that isn't master. When you have a 8200 (2node system) and one node fails, how many nodes do you have left for Garbage Collection?

If you run out of space with dedupe, you're pretty much screwed.... There is a way to manually remove and reduce spare chunklets so you get a little bit more space, but it seems to me like your failed node needs replacement ASAP to get GC back up and running.

What 3PAR OS version are you running and which version of dedupe (3.3.1 introduced dedupe v3 which is shown with "showcpg -d" in CLI under "shared version").

_________________
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.


Top
 Profile  
Reply with quote  
 Post subject: Re: Failed node on 8200
PostPosted: Mon Nov 02, 2020 9:02 am 

Joined: Mon Mar 04, 2019 8:57 am
Posts: 6
We have enquired about GC but all they reported is that system is stable. After finally almost 4 different agents we now got hold of one that identified immediately the issue and said that GC is running however very very slow approx 6GB every 30mins. i'm not sure if GC behaviour has changed over versions, saying this cos support guy gave us the below stats so it seems that GC still running but at a slower pace:
GC: Final Stats: Total space freed: 6 GB Total time taken: 1857 secs Dryrun: 0 Abandon run: 0

The dedup that is running is v2 and 3par running on 3.3.1 MU2.


Top
 Profile  
Reply with quote  
 Post subject: Re: Failed node on 8200
PostPosted: Tue Nov 03, 2020 4:53 am 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 1570
Location: Europe
mulae wrote:
We have enquired about GC but all they reported is that system is stable. After finally almost 4 different agents we now got hold of one that identified immediately the issue and said that GC is running however very very slow approx 6GB every 30mins. i'm not sure if GC behaviour has changed over versions, saying this cos support guy gave us the below stats so it seems that GC still running but at a slower pace:
GC: Final Stats: Total space freed: 6 GB Total time taken: 1857 secs Dryrun: 0 Abandon run: 0

The dedup that is running is v2 and 3par running on 3.3.1 MU2.


To me, uncontrolled growth is not a stable system.

Adding to this that you have TDVV2 doesn't make this a better scenario. If I were you I would push back on HPE on this... Uncontrolled growth due to bad GC seems to indicate that your DDS is growing. With TDVV2, DDS compaction/reduction is like waiting for the polar ice cap to melt, so even when the node as some point gets replaced it will most likely take a very long time until you've regained the "blocked" data.

Not knowing a lot about your environment, do you have capacity somewhere else that would allow you to "start from scratch" and get TDVV3? That should greatly reduce the impact of the issue I think you are seeing, but it would required all dedupe volumes to be deleted or converted for a new TDVV3 DDS to be created. Considering you have less that 2% free capacity (and probably even less today), you don't have the free space needed for converting.

_________________
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.


Top
 Profile  
Reply with quote  
 Post subject: Re: Failed node on 8200
PostPosted: Wed Nov 11, 2020 2:44 pm 
User avatar

Joined: Mon Jan 13, 2014 11:58 am
Posts: 30
Location: Claremore, OK
Not relating to the question, but I had the same issue getting a replacement node for one of my 8400's recently. After a week of non-stop complaining, I found out no one was checking compatible part numbers.

Hopefully you have a replacement by now, but if not, that's why. Took less than an hour to get it after they discovered the "Oopse".

Took us four days to get a replacement from more or less across the street.

_________________
vSphere | Windows | Linux
2x 3Par 7400 | Brocade SAN


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 


Who is online

Users browsing this forum: No registered users and 64 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt