Describir: Recognition of humans and their activities using video /