glVertexAttribDivisor instructs program to advance this specific location by instance, not by vertex. So it will read the next value after all vertices have been rendered, and will start a new instance from the beginning of the same vertex buffers.
In this case the actual data for WVP and world matrices is update each frame to make instances change their location.
The function glVertexAttribDivisor() is what makes this an instance data rather than vertex data. It takes two parameters - the first one is the vertex array attribute and the second tells OpenGL the rate by which the attribute advances during instanced rendering. It basically means the number of times the entire set of vertices is rendered before the attribute is updated from the buffer. By default, the divisor is zero. This causes regular vertex attributes to be updated from vertex to vertex. If the divisor is 10 it means that the first 10 instances will use the first piece of data from the buffer, the next 10 instances will use the second, etc. We want to have a dedicated WVP matrix for each instance so we use a divisor of 1.
We repeat these steps for all four vertex array attributes of the matrix. We then do the same with the world matrix. Note that unlike the other vertex attributes such as the position and the normal we don't upload any data into the buffers. The reason is that the WVP and world matrices are dynamic and will be updated every frame. So we just set things up for later and leave the buffers uninitialized for now.
void Mesh::Render(unsigned int NumInstances, const Matrix4f* WVPMats, const Matrix4f* WorldMats)
{ 
    glBindBuffer(GL_ARRAY_BUFFER, m_Buffers[WVP_MAT_VB]);
    glBufferData(GL_ARRAY_BUFFER, sizeof(Matrix4f) * NumInstances, WVPMats, GL_DYNAMIC_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, m_Buffers[WORLD_MAT_VB]);
    glBufferData(GL_ARRAY_BUFFER, sizeof(Matrix4f) * NumInstances, WorldMats, GL_DYNAMIC_DRAW);
    glBindVertexArray(m_VAO);
    for (unsigned int i = 0 ; i < m_Entries.size() ; i++) {
        const unsigned int MaterialIndex = m_Entries[i].MaterialIndex;
        assert(MaterialIndex < m_Textures.size());
        if (m_Textures[MaterialIndex]) {
            m_Textures[MaterialIndex]->Bind(GL_TEXTURE0);
        }
        glDrawElementsInstancedBaseVertex(GL_TRIANGLES, 
                                         m_Entries[i].NumIndices, 
                                         GL_UNSIGNED_INT, 
                                         (void*)(sizeof(unsigned int) * m_Entries[i].BaseIndex), 
                                         NumInstances,
                                         m_Entries[i].BaseVertex);
    }
    // Make sure the VAO is not changed from the outside 
    glBindVertexArray(0);
}
Vertex shader.
Instead of getting the WVP and world matrics as uniform variables they are now coming in as regular vertex attributes. The VS doesn't care that their values will only be updated once per instance and not per vertex. As discussed above, the WVP matrix takes up locations 3-6 and the world matrix takes up locations 7-10.
The last line of the VS is where we see the second way of doing instanced rendering (the first being passing instance data as vertex attributes). 'gl_InstanceID' is a built-in variable which is available only in the VS. Since we plan to use it in the FS we have to access it here and pass it along in a regular output variable. The type of gl_InstanceID is an integer so we use an output variable of the same type. Since integers cannot be interpolated by the rasterizer we have to mark the output variable as 'flat' (forgetting to do that will trigger a compiler error).
#version 330
layout (location = 0) in vec3 Position; 
layout (location = 1) in vec2 TexCoord; 
layout (location = 2) in vec3 Normal; 
layout (location = 3) in mat4 WVP; 
layout (location = 7) in mat4 World; 
out vec2 TexCoord0; 
out vec3 Normal0; 
out vec3 WorldPos0; 
flat out int InstanceID; 
void main() 
{ 
    gl_Position = WVP * vec4(Position, 1.0); 
    TexCoord0 = TexCoord; 
    Normal0 = World * vec4(Normal, 0.0)).xyz; 
    WorldPos0 = World * vec4(Position, 1.0)).xyz; 
    InstanceID = gl_InstanceID; 
};
Fragment shader.
flat in int InstanceID;
...
uniform vec4 gColor[4];
...
void main() 
{ 
    vec3 Normal = normalize(Normal0); 
    vec4 TotalLight = CalcDirectionalLight(Normal); 
    for (int i = 0 ; i < gNumPointLights ; i++) { 
        TotalLight += CalcPointLight(gPointLights[i], Normal); 
    } 
    for (int i = 0 ; i < gNumSpotLights ; i++) { 
        TotalLight += CalcSpotLight(gSpotLights[i], Normal); 
    } 
    FragColor = texture(gColorMap, TexCoord0.xy) * TotalLight * gColor[InstanceID % 4];
};
Main render loop needs to transpose matrices (column-vector matrices).
Pipeline p;
p.SetCamera(m_pGameCamera->GetPos(), m_pGameCamera->GetTarget(), m_pGameCamera->GetUp());
p.SetPerspectiveProj(m_persProjInfo); 
p.Rotate(0.0f, 90.0f, 0.0f);
p.Scale(0.005f, 0.005f, 0.005f); 
Matrix4f WVPMatrics[NUM_INSTANCES];
Matrix4f WorldMatrices[NUM_INSTANCES];
for (unsigned int i = 0 ; i < NUM_INSTANCES ; i++) {
    Vector3f Pos(m_positions[i]);
    Pos.y += sinf(m_scale) * m_velocity[i];
    p.WorldPos(Pos); 
    WVPMatrics[i] = p.GetWVPTrans().Transpose();
    WorldMatrices[i] = p.GetWorldTrans().Transpose();
}
m_pMesh->Render(NUM_INSTANCES, WVPMatrics, WorldMatrices);
Buffer re-specification
This solution is to reallocate the buffer object before you start modifying it. This is termed buffer "orphaning". There are two ways to do it.
The first way is to call glBufferData with a NULL pointer, and the exact same size and usage hints it had before. This allows the implementation to simply reallocate storage for that buffer object under-the-hood. Since allocating storage is (likely) faster than the implicit synchronization, you gain significant performance advantages over synchronization. And since you passed NULL, if there wasn't a need for synchronization to begin with, this can be reduced to a no-op. 
Instanced arrays
Normally, vertex attribute arrays are indexed based on the index buffer, or when doing array rendering, once per vertex from the start point to the end. However, when doing instanced rendering, it is often useful to have an alternative means of getting per-instance data than accessing it directly in the shader via a Uniform Buffer Object, a Buffer Texture, or some other means. 
It is possible to have one or more attribute arrays indexed, not by the index buffer or direct array access, but by the instance count. This is done via this function:
The index is the attribute index to set. If divisor is zero, then the attribute acts like normal, being indexed by the array or index buffer. If divisor is non-zero, then the current instance is divided by this divisor, and the result of that is used to access the attribute array.
The "current instance" mentioned above starts at the base instance for instanced rendering, increasing by 1 for each instance in the draw call. Note that this is not how the gl_InstanceID is computed for Vertex Shaders; that is not affected by the base instance. If no base instance is specified, then the current instance starts with 0. 
This is generally considered the most efficient way of getting per-instance data to the vertex shader. However, it is also the most resource-constrained method in some respects. OpenGL implementations usually offer a fairly restricted number of vertex attributes (16 or so), and you will need some of these for the actual per-vertex data. So that leaves less room for your per-instance data. While the number of instances can be arbitrarily large (unlike UBO arrays), the amount of per-instance data is much smaller.
However, that should be plenty for a quaternion orientation and a position, for a simple transformation. That would even leave one float (the position only needs to be 3D) to provide a fragment shader an index to access an Array Texture. 
 
Matrix attributes
Attributes in GLSL can be of matrix types. However, our attribute binding functions only bind up to a dimensionality of 4. OpenGL solves this problem by converting matrix GLSL attributes into multiple attribute indices.
If you directly assign an attribute index to a matrix type, it implicitly takes up more than one attribute index. The number of attributes a matrix takes up depends on the number of columns of the matrix: a mat2 matrix will take 2, a mat2x4 matrix will take 2, while a mat4x2 will take 4. The size of each attribute is the number of rows of the matrix.
Each bound attribute in the VAO therefore fills in a single column, starting with the left-most and progressing right. Thus, if you have a 3x3 matrix, and you assign it to attribute index 3, it will naturally take attribute indices 3, 4, and 5. Each of these indices will be 3 elements in size. Attribute 3 is the first column, 4 is the second, and 5 is the last.
OpenGL will allocate locations for matrix attributes contiguously as above. So if you defined a 3x3 matrix, it will return one value, but the next two values are also valid, active attributes.
Double-precision matrices (where available) will take up twice as much space. So a dmat3x3 will take up 6 attribute indices, two for each column.
Matrix inputs take up one attribute index for every column. Array attributes take up one index per element, even if the array is a float and could have use up to 4 indices.
Double-precision input variables of double or dvec types always take up one attribute. Even if they are dvec4 .
These combine with each other. A mat2x4[2] array is broken up into four vec4 values, each of which is assigned an index. Thus, it takes up 4 indices; the first two indices specify the two columns of array index 0, and the next two indices specify the two columns of array index 1.
When an input requires multiple indices, it will always be assigned sequential indices starting from the given index. Consider:
layout(location = 3) in mat4 a_matrix;
a_matrix will be assigned attribute indices 3, 4, 5, and 6. This works regardless of what methods you use to assign vertex attribute indices to input variables.
Linking will fail if any index ranges collide. Thus, this will fail to link:
layout(location = 0) in mat4 a_matrix;
layout(location = 3) in vec4 a_vec;